Reproducible, scalable, and shareable analysis workflows with Nextflow


Pre-Requisites

  • This lesson assumes a working understanding of the command line, familiarity with Conda, Docker/Singularity and running jobs on HPC cluster.

  • This lesson also assumes some familiarity with biological concepts, including the structure of DNA, nucleotide abbreviations, and the concept of genomic variation within a population.


Schedule & Learning Objectives

Open in Gitpod

Chapter Learning Objectives
Session - 1 3 hours
0. Setup GitPod Link & Setup
  How to install Nextflow?
  Data & Environment Setup
1. Nextflow Introduction What is a workflow and what are workflow management systems?
  Why should I use a workflow management system?
  What is Nextflow?
  What are the main features of Nextflow?
  What are the main components of a Nextflow script?
  How do I run a Nextflow script?
  How can I use the nextflow logs?
2. NF-Core Where can I find existing bioinformatic pipelines?
  What is nf-core tools?
  How do you run nf-core pipelines?
  How do you configure nf-core pipelines?
  How do you use nf-core pipelines offline?
3. NF-Core @ HPC How do I configure nf-core pipelines to submit jobs to HPC cluster
Session - 2 3 hours
4. Nextflow-Scripting What language are Nextflow scripts written in?
  How do I store values in a Nextflow script?
  How do I write comments Nextflow script?
  How can I store and retrieve multiple values?
  How are strings evaluated in Nextflow?
  How can I create simple re-useable code blocks?
5. Nextflow-Channels How do I get data into Nextflow?
  How do I handle different types of input, e.g. files and parameters?
  How do I create a Nextflow Channel?
  How can I use pattern matching to select input files?
  How do I change the way inputs are handled?
6. Nextflow-Processes How do I run tasks/processes in Nextflow?
  How do I pass parameters to a Nextflow script on the command line?
  How do I get data, files and values, into and out of processes?
  How do can I control when a process is executed?
  How do I control resources, such as number of CPUs and memory, available to processes?
  How do I save output/results from a process?
7. Nextflow-Workflow How do I connect channels and processes to create a workflow?
  How do I invoke a process inside a workflow?
8. Nextflow-Operators How do I perform operations, such as filtering, on channels?
  What are the different kinds of operations I can perform on channels?
  How do I combine operations?
  How can I use a CSV file to process data into a Channel?
Session - 3 3 hours
9. Simple Variant-Calling pipeline How can I create a variant calling pipeline?
  How do I print all the pipeline parameters by using a single command?
  How can I use conda with my pipeline?
  How do I know when my pipeline has finished?
  How do I see runtime metrics and execution information?
10. Nextflow configuration What is the difference between the workflow implementation and the workflow configuration?
  How do I configure a Nextflow workflow?
  How do I assign different resources to different processes?
  How do I separate and provide configuration for different computational systems?
  How do I change configuration settings from the default settings provided by the workflow?
11. Nextflow-Modules How can I reuse a Nextflow process in different workflows?
  How do I use parameters in a module?
12. Nextflow-Sub-workflows How do I reuse a workflow as part of a larger workflow?
  How do I run only a part of a workflow?
13. Nextflow-Reporting How do I get information about my pipeline run?
  How can I see what commands I ran?
  How can I create a report from my run?
14. Workflow caching and checkpointing How can I restart a Nextflow workflow after an error?
  How can I add new data to a workflow?
  Where can I find intermediate data and results?
15. nf-core/variantcall Assembling the variant-calling workflow nf-core style
  How can I add nf-core modules?
16. Contributing to nf-core/modules How can i add a new module to nf-core?
Nextflow Useful Links  

Editors

Credits

Graeme R. Grimes, Evan Floden, Paolo Di Tommaso, Phil Ewels and Maxime Garcia Introduction to Workflows with Nextflow and nf-core. https://github.com/carpentries-incubator/workflows-nextflow 2021.

Josh Herr, Ming Tang, Lex Nederbragt, Fotis Psomopoulos (eds): “Data Carpentry: Wrangling Genomics Lesson.” Version 2017.11.0, November 2017, http://www.datacarpentry.org/wrangling-genomics/, doi: 10.5281/zenodo.1064254

Lee, (2019). Happy Belly Bioinformatics: an open-source resource dedicated to helping biologists utilize bioinformatics. Journal of Open Source Education, 4(41), 53, https://doi.org/10.21105/jose.00053