nf-core module creation
nf-core module creation
- Overview
- 1. Initial environment setup (optional; recommended for local setup)
- Creating a New
nf-core
Pipeline from Scratch- Creating a New
nf-core
Pipeline from Scratch - Installing an Available Module Directly from nf-core
- Creating New Modules for nf-core
- Forking the nf-core Modules GitHub Repo
- Where Do I Write My Actual Code?
- The Accompanying
meta.yml
File Describes the New Module in Detail - How do I locally test my module?
- Incorporating optional parameters for the module
- Testing the module
- Merging your local module code with the main
nf-core
repository - Requesting peer review from other
nf-core
developers
- Congratulations, you are officially a contributor to
nf-core
!
- Creating a New
Overview
- Initial Environment Setup (
conda
) - Initialize a new
nf-core
pipeline - Install or create
nf-core
modules needed for the pipeline - Create subworkflows for different pipeline steps/phases
- Create a primary workflow
- Modify
config
files for the pipeline components - Test the pipeline
- Deployment, distribution, and support
1. Initial environment setup (optional; recommended for local setup)
(requires conda/mamba to be installed prior to beginning)
Create a Conda environment for Nextflow
# Create a new empty environment
mamba create -n nextflow
mamba install -c conda-forge -c bioconda nextflow=21.10.6 nf-core=2.2 graphviz openjdk=8.0.312 git=2.35.0
# Activate the environment
conda activate nextflow
# Create a clean, sharable copy of this conda environment
conda env export | grep -v "prefix" > env.nextflow.yml
Creating a New nf-core
Pipeline from Scratch
This uses templates standardized by nf-core tools
, which are available from the nf-core
package in our conda environment
# This will activate an interactive prompt to initialize the pipeline name, author, and git repo
nf-core create
# $ Workflow Name: quaisar
# $ Description: nf-core demo workflow
# $ Author: hseabolt
cd nf-core-quaisar
# This is the new repo for this pipeline -- many files and template files
tree .
.
├── assets
│ ├── email_template.html
│ ├── email_template.txt
│ ├── multiqc_config.yaml
│ ├── nf-core-quaisar_logo_light.png
│ ├── samplesheet.csv
│ ├── schema_input.json
│ └── sendmail_template.txt
├── bin
│ └── check_samplesheet.py
├── CHANGELOG.md
├── CITATIONS.md
├── CODE_OF_CONDUCT.md
├── conf
│ ├── base.config
│ ├── igenomes.config
│ ├── modules.config
│ ├── test.config
│ └── test_full.config
├── docs
│ ├── images
│ │ ├── mqc_fastqc_adapter.png
│ │ ├── mqc_fastqc_counts.png
│ │ ├── mqc_fastqc_quality.png
│ │ ├── nf-core-quaisar_logo_dark.png
│ │ └── nf-core-quaisar_logo_light.png
│ ├── output.md
│ ├── README.md
│ └── usage.md
├── lib
│ ├── nfcore_external_java_deps.jar
│ ├── NfcoreSchema.groovy
│ ├── NfcoreTemplate.groovy
│ ├── Utils.groovy
│ ├── WorkflowMain.groovy
│ └── WorkflowQuaisar.groovy
├── LICENSE
├── main.nf
├── modules
│ ├── local
│ │ └── samplesheet_check.nf
│ └── nf-core
│ └── modules
│ ├── custom
│ │ └── dumpsoftwareversions
│ │ ├── main.nf
│ │ ├── meta.yml
│ │ └── templates
│ │ └── dumpsoftwareversions.py
│ ├── fastqc
│ │ ├── main.nf
│ │ └── meta.yml
│ └── multiqc
│ ├── main.nf
│ └── meta.yml
├── modules.json
├── nextflow.config
├── nextflow_schema.json
├── README.md
├── subworkflows
│ └── local
│ └── input_check.nf
└── workflows
└── quaisar.nf
Creating a New nf-core
Pipeline from Scratch
Here are the steps to create a new nf-core
pipeline from scratch:
-
Initial Environment Setup (Optional; Recommended for Local Setup)
Before you begin, you need to set up your environment. This requires
conda
ormamba
to be installed. First, create a Conda environment for Nextflow:# Create a new empty environment mamba create -n nextflow mamba install -c conda-forge -c bioconda nextflow=21.10.6 nf-core=2.2 graphviz openjdk=8.0.312 git=2.35.0 # Activate the environment conda activate nextflow # Create a clean, sharable copy of this conda environment conda env export | grep -v "prefix" > env.nextflow.yml
-
Initialize a New
nf-core
PipelineNext, you can initialize your new pipeline. This will use templates standardized by
nf-core tools
, which are available from thenf-core
package in your Conda environment:# This will activate an interactive prompt to initialize the pipeline name, author, and git repo nf-core create # Workflow Name: quaisar # Description: nf-core demo workflow # Author: hseabolt
Then, navigate to your new pipeline’s directory:
cd nf-core-quaisar
-
Install or Create
nf-core
Modules Needed for the Pipelinenf-core
has many modules already available for a variety of bioinformatics software. You can download and install these directly into your new pipeline usingnf-core
tools. Use the following command to check which modules are already created and peer-reviewed:# Scroll through this on the terminal or pipe to grep to check a specific module nf-core modules list remote # Grep example nf-core modules list remote | grep "bwa/align"
-
Create Subworkflows for Different Pipeline Steps/Phases
This is where you would define the series of steps that form the pipeline. Each subworkflow can contain one or more modules.
-
Create a Primary Workflow
The primary workflow ties together all the subworkflows into a cohesive pipeline.
-
Modify Config Files for the Pipeline Components
You can adjust settings for your pipeline by modifying the appropriate configuration files.
-
Test the Pipeline
It’s good practice to thoroughly test your pipeline before deploying it.
-
Deployment, Distribution, and Support
Once your pipeline is tested and ready, you can deploy and distribute it. You may also want to provide support for users of your pipeline.
Remember to track your changes with Git:
git add --all
git commit -m "Initial commit of quaisar nf-core pipeline"
# Prior to pushing to an online repo, make sure to create the repo online and don't add any files to it
git remote add origin <github repo URL>
git push origin master
Installing an Available Module Directly from nf-core
nf-core provides a simple interface for module installations. Most software consists of a single command plus command-line parameter arguments – these modules can be installed directly. However, some software has multiple sub-commands, such as bwa index
, bwa align
, bwa mem
, etc. In these cases, nf-core requires each sub-command to be created as a standalone module within the parent module, which can also be directly installed. Be aware that installing the parent module (i.e., bwa
in this case) through nf-core does not give you direct access to its sub-modules.
# Install nf-core modules from nf-core remote (master) repo
nf-core modules install samtools_stats
# Install modules that have specific sub-modules
nf-core modules install bwa/index
Creating New Modules for nf-core
If you just want to start writing code, here’s a brief rundown of the steps:
-
Join nf-core Slack: Go to https://nf-co.re/join to join the community.
-
Fork and Clone the
nf-core/modules
Repo: Fork thenf-core/modules
repo (https://github.com/nf-core/modules) to your own GitHub repositories, then clone it locally from your fork. -
Create a New Branch: Create a new branch for the new module within your local clone of
nf-core/modules
. -
Raise an Issue on the Main nf-core GitHub: Alert others that you are working on this module by raising an issue on the main nf-core GitHub (https://github.com/nf-core). Go to
Issues
–>New Module
. Add yourself to theAssignees
. -
Check for Existing Conda Recipe, Docker, or Singularity Image: Ensure a Conda recipe, Docker, or Singularity image already exists in Bioconda, Biocontainers, quay.io, or similar for the software you want to create a module for.
-
Use nf-core Tools to Create, Edit, and Test the Module: The nf-core tools package provides useful commands for module creation and testing.
-
Push Your Module’s Branch to Your nf-core/modules Fork: After you’re done editing and testing, push your module’s branch to your fork of
nf-core/modules
. -
Open a Pull Request on the Main nf-core/modules Repo: Request to merge your changes into the main nf-core/modules repository.
-
Request a Review in the nf-core Slack: In the
request-review
channel, ask for someone to review the pull request and approve the merge.
Forking the nf-core Modules GitHub Repo
Start by navigating to https://github.com/nf-core/modules and click Fork
in the top right corner of the web browser. This will create a forked copy of the entire nf-core/modules
repo in your own GitHub repository.
Next, we should let other nf-core
developers know that we are working on creating new modules. This way, we avoid duplicating efforts or interfering with the work of others. In the main nf-core/modules
GitHub repo, open an Issue
(https://github.com/nf-core/modules/issues) with New Issue
–> New Module
.
(Remember to search the open modules to see if someone is already working on the module you intend to add!)
Read over the markdown
text in the dialogue box for new modules and fill out/edit appropriately. Assign yourself in the Assignees
in the top right corner. When you are finished, click Submit New Issue
.
Now, we need to clone the repo locally so that we can branch/modify it.
# From a directory where you want to clone the fork of the modules repo from your Git.
# DO NOT do this inside the workflow that you are creating
git clone https://github.com/<your-username>/modules.git
cd modules
# The modules directory should look something like this
ls
# You should see: docs, LICENSE, main.nf, modules...
# Create a new branch for the module you want to create
# Here we are creating "nonpareil", a software used for metagenomics analyses.
git checkout -b nonpareil
# IMPORTANT: Before modifying or creating ANY new files, always make sure that you are not working in the master branch!!!
nf-core modules create nonpareil --author @<your-username> --label process_low --meta
# The new directories for the new module are created in ./modules and ./tests
# ./modules/nonpareil/main.nf
# ./modules/nonpareil/meta.yml
# ./tests/modules/nonpareil/main.nf
# ./tests/modules/nonpareil/test.yml
# ./tests/modules/nonpareil/nextflow.config
# ./tests/config/pytest_modules.yml
Now you can start editing these files to add your new module!
Where Do I Write My Actual Code?
At this point, you are ready to write the actual module code. This involves modifying the main.nf
file, which includes a number of TODO
statements that must all be addressed for the module creation to be successful and ready to be merged into nf-core
.
Your main code for running the software goes in the ./modules/nonpareil/main.nf
file. This includes references to the Singularity
containers, inputs, outputs, and any other details. It’s critical to include the code block at the bottom that captures the Version
number of the module software (only the version number, not a whole line of info).
If there are any special parameter arguments (e.g., -t 1 -p 2
, etc.), these should be included in the ./tests/modules/nonpareil/nextflow.config
and NOT in the module code in main.nf
.
Here’s an example main.nf
to run the module:
process NONPAREIL {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::nonpareil=3.4.1" : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/nonpareil%3A3.4.1--r41h9f5acd7_1' :
'quay.io/biocontainers/nonpareil' }"
input:
tuple val(meta), path(reads)
output:
tuple val(meta), path("*.fastq.gz"), emit: reads
path "versions.yml" , emit: versions
when:
task.ext.when == null || task.ext.when
script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
nonpareil \\
-s $reads \\
$args \\
-o ${prefix}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
nonpareil: \$(nonpareil -V | cut -d ' ' -f2 | sed 's/v//')
END_VERSIONS
"""
}
In this script, nonpareil
is the command to run the software, -s $reads
refers to the input, $args
contains optional arguments, and -o ${prefix}
specifies the output prefix. The version of nonpareil
is then captured and written into a versions.yml
file.
The Accompanying meta.yml
File Describes the New Module in Detail
The ./modules/nonpareil/meta.yml
file acts as a companion to the main.nf
file, providing documentation that describes the module’s purpose, authors, sources, and details about its inputs and outputs. It is important that the input and output specifications in the meta.yml
file match those defined in the main.nf
file.
Typically, you can find the information required for the meta.yml
file from the original GitHub repository or other resources where the source code for the software is hosted.
Here’s an example of what a meta.yml
file might look like for this module (./modules/nonpareil/meta.yml
)
name: nonpareil
description: Estimate metagenomic coverage and sequence diversity
keywords:
- diversity
- metagenomics
- nonpareil
- kmer
tools:
- nonpareil:
description: Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve 'nearly complete coverage'.
homepage: https://github.com/lmrodriguezr/nonpareil
documentation: https://nonpareil.readthedocs.io/en/latest/index.html
tool_dev_url: https://github.com/lmrodriguezr/nonpareil
licence: ["Artistic Licence 2.0"]
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- reads:
type: file
description: List of input FastQ files of size 1 and 2 for single-end and paired-end data,respectively.
pattern: "*.{fastq.gz}"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- reads:
type: file
description: Subsampled FastQ files, 1 for single-end data or 2 for paired-end data.
pattern: "*.{fastq.gz}"
authors:
- "@hseabolt"
How do I locally test my module?
After you’ve written the module code main.nf
and meta.yml
and you’re satisfied with it, it’s time to test the module code locally. Here’s how you can go about it.
Step 1: Identify suitable testing data for your module
nf-core
provides a large set of test data in various common bioinformatics formats. These datasets are available in the nf-core/test-datasets
repository on their main GitHub. You can easily use this data for nf-core
module testing using a meta map, without the need to download any data directly.
You can check the list of available data in the ./modules/tests/config/test_data.config
file.
Step 2: Edit the test module
Next, you need to modify the ./tests/modules/nonpareil/main.nf
file to include the test data you identified for your module. Ensure that the data and arguments match what the main code in ./modules/nonpareil/main.nf
expects.
For instance, suppose you want to include a gzipped FASTQ file from the available nf-core
datasets (e.g., test_1_fastq_gz
). Your ./modules/tests/modules/nonpareil/main.nf
file might look like this:
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
include { NONPAREIL } from '../../../../modules/nonpareil/main.nf'
// Test with single-end data
workflow test_seqtk_sample_single_end {
input = [ [ id:'test', single_end:true ], // meta map
file(params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], checkIfExists: true) ]
NONPAREIL ( input, 50 )
}
// Test with paired-end data
workflow test_seqtk_sample_paired_end {
input = [ [ id:'test', single_end:false ], // meta map
[ file(params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], checkIfExists: true),
file(params.test_data['sarscov2']['illumina']['test_2_fastq_gz'], checkIfExists: true) ]
]
NONPAREIL ( input, 50 )
}
In this example, we’re testing our NONPAREIL
module with both single-end and paired-end data. The input
for each workflow is a tuple containing a meta map and one or two input files, depending on the type of data. The checkIfExists: true
option ensures that the workflow will only run if the specified input file exists. Finally, NONPAREIL ( input, 50 )
runs our NONPAREIL
module with the specified input and an additional argument of 50
.
Incorporating optional parameters for the module
If you want to incorporate or test any optional parameters for your module, you can specify them in the ./modules/tests/modules/nonpareil/nextflow.config
file. An example of this, using an argument -T kmer
and setting the prefix
parameter that is fed into the main code, might look like this:
process {
publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }
withName: NONPAREIL {
ext.args = '-T kmer' // extend this string to include all args, do not use separate strings
ext.prefix = { "${meta.id}.np" }
}
}
Testing the module
Now you’re ready to test your nf-core
module. In the root folder of nf-core/modules
, you can run the following commands:
# Start to test the module
nf-core modules create-test-yml nonpareil
# This will create an interactive prompt on the terminal.
# For most options, just press ENTER / leave blank.
# For testing profile: use Singularity profile
If the testing is successful, a new file will be created: tests/modules/nonpareil/test.yml
.
You can then run local linting, which will give you more specific details about the tests that were run:
# Now run local linting
nf-core lint nonpareil --dir .
If all tests and linting are successful (or in some cases, safely ignorable), then you’re ready to commit the code to git and initiate merge requests:
# Check the status of the repository
git status
# Stage changes for commit
git add --all # or instead of --all, add the files listed by git status
# Commit the changes
git commit -m "Created new nf-core module Nonpareil"
# Push the changes to your remote repository
git push -u origin nonpareil
If you want to manually check the testing output files for correctness (e.g., FASTA files created by the module), you can cd
into the work
directory specified in the terminal output from create-test-yml
and verify that the output files are what you expect.
Merging your local module code with the main nf-core
repository
To share your module with others, you’ll need to create a Pull Request
(PR
) across forks from your local copy of nf-core/modules
.
- On the
nf-core/modules
Github page, navigate to thePull requests
tab, then chooseNew pull request
. - Fill out the dialogue box for the new pull request. Make sure to get the
Issue Number
from the originalNew Module Issue
that you opened previously, and include this to let others know whichIssue
you are closing out. - Double check that you are merging
yourusername/modules
->nonpareil
intonf-core/modules
->master
.
When a new PR
is created, the nf-core
Github will automatically run some integration tests and checks. You must wait for all of these checks to complete and pass prior to proceeding. If any tests do not pass, you need to address these issues before your new code can be merged. These are often small things like whitespace or minor formatting issues.
As you address these issues, you need to re-run git add
, git commit
, git push
to your local fork, which will automatically update the open PR
with nf-core
and re-trigger the integration tests. You do not need to create a new PR
.
Once all module tests pass, add Labels
(New Module
, Ready_For_Review
) on the right side of the PR
dashboard in Github.
Requesting peer review from other nf-core
developers
After the tests pass, you should request a review from other nf-core
developers:
- Copy the
PR
’s URL link and paste it in thenf-core
Slack channel#request-review
and politely ask for someone to review your new module. - Once a reviewer agrees to review, they will be able to make comments or ask questions through Github that you must address satisfactorily. This may require updates to the code or simple responses to questions.
- Once your reviewer(s) are satisfied, they will approve the
pull request
. Congratulations! Your module is officially included innf-core/modules
!
Anyone wanting to use your module now can install it directly with:
nf-core modules install nonpareil
Congratulations, you are officially a contributor to nf-core
!
Documentation and Links
- Join nf-core
- nf-core Documentation
- nf-core Github
- List of Singularity containers available in Galaxy
- nf-core Troubleshooting