nf-core/ampliseq
Amplicon sequencing analysis workflow using DADA2 and QIIME2
Path to tab-separated sample sheet
string
^\S+\.(tsv|csv|yml|yaml|txt)$
Path to ASV/OTU fasta file
string
^\S+\.(fasta|fas|fna|fa|ffn)$
Path to folder containing zipped FastQ files
string
Forward primer sequence
string
Reverse primer sequence
string
Path to metadata sheet, when missing most downstream analysis are skipped (barplots, PCoA plots, …).
string
Path to multi-region definition sheet, for multi-region analysis with Sidle
string
^\S+\.(tsv|csv|yml|yaml|txt)$
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Save intermediate results such as QIIME2’s qza and qzv files
boolean
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
If data has binned quality scores such as Illumina NovaSeq
boolean
If data is single-ended PacBio reads instead of Illumina
boolean
If data is single-ended IonTorrent reads instead of Illumina
boolean
If data is single-ended Illumina reads instead of paired-end
boolean
If analysing ITS amplicons or any other region with large length variability with Illumina paired end reads
boolean
Type of quality scores in raw read data
string
If using --input_folder
: samples were sequenced in multiple sequencing runs
boolean
If using --input_folder
: naming of sequencing files
string
/*_R{1,2}_001.fastq.gz
Set read count threshold for failed samples.
integer
1
Ignore input files with too few reads.
boolean
Spurious sequences sometimes lack primer sequences and primers introduce errors that can be removed in that step
Cutadapt will retain untrimmed reads, choose only if input reads are not expected to contain primer sequences.
boolean
Sets the minimum overlap for valid matches of primer sequences with reads for cutadapt (-O).
integer
3
Sets the maximum error rate for valid matches of primer sequences with reads for cutadapt (-e).
number
0.1
Cutadapt will be run twice to ensure removal of potential double primers
boolean
Ignore files with too few reads after trimming.
boolean
Read trimming and quality filtering is supposed to reduce spurious results and aid error correction
DADA2 read truncation value for forward strand, set this to 0 for no truncation
integer
DADA2 read truncation value for reverse strand, set this to 0 for no truncation
integer
If —trunclenf and —trunclenr are not set, these values will be automatically determined using this median quality score
integer
25
Assures that values chosen with —trunc_qmin will retain a fraction of reads.
number
0.75
DADA2 read filtering option
integer
2
DADA2 read filtering option
integer
50
DADA2 read filtering option
integer
Ignore files with too few reads after quality filtering.
boolean
Mode of sample inference: “independent”, “pooled” or “pseudo”
string
Strategy to merge paired end reads. When paired end reads are not sufficiently overlapping for merging, you can use “concatenate” (not recommended). When you have a mix of overlapping and non overlapping reads use “consensus”
string
The score assigned for each matching base pair during sequence alignment.
integer
1
The penalty score assigned for each mismatched base pair during sequence alignment.
integer
-2
The penalty score assigned for each gap introduced during sequence alignment.
integer
-4
The minimum number of overlapping base pairs required to merge forward and reverse reads.
integer
12
The maximum number of mismatches allowed within the overlapping region for merging reads.
integer
The percentile used to determine a stringent cutoff which will correspond to the minimum observed overlap in the dataset. This ensures that only read pairs with high overlap are merged into consensus sequences. Those with insufficient overlap are concatenated.
number
0.001
ASV post-processing takes place after ASV computation but before taxonomic assignment, it will affect all downstream processes
Post-cluster ASVs with VSEARCH
boolean
Pairwise Identity value used when post-clustering ASVs if --vsearch_cluster
option is used (default: 0.97).
number
0.97
Raise stack size when filtering VSEARCH clusters
boolean
true
Enable SSU filtering. Comma separated list of kingdoms (domains) in Barrnap, a combination (or one) of “bac”, “arc”, “mito”, and “euk”. ASVs that have their lowest evalue in that kingdoms are kept.
string
Minimal ASV length
integer
Maximum ASV length
integer
Filter ASVs based on codon usage
boolean
Starting position of codon tripletts
integer
1
Ending position of codon tripletts
integer
Define stop codons
string
TAA,TAG
Choose a method and database for taxonomic assignments to single-region amplicons
Name of supported database, and optionally also version number
string
Path to a custom DADA2 reference taxonomy database
string
Path to a custom DADA2 reference taxonomy database for species assignment
string
Comma separated list of taxonomic levels used in DADA2’s assignTaxonomy function
string
If the expected amplified sequences are extracted from the DADA2 reference taxonomy database
boolean
If multiple exact matches against different species are returned
boolean
If reverse-complement of each sequences will be also tested for classification
boolean
ASV fasta will be subset into chunks of this size for classification
integer
10000
Newick file with reference phylogenetic tree. Requires also --pplace_aln
and --pplace_model
.
string
File with reference sequences. Requires also --pplace_tree
and --pplace_model
.
string
Phylogenetic model to use in placement, e.g. ‘LG+F’ or ‘GTR+I+F’. Requires also --pplace_tree
and --pplace_aln
.
string
Method used for alignment, “hmmer” or “mafft”
string
Tab-separated file with taxonomy assignments of reference sequences.
string
A name for the run
string
Name of supported database, and optionally also version number
string
Path to files of a custom QIIME2 reference taxonomy database (tarball, or two comma-separated files)
string
Path to QIIME2 trained classifier file (typically *-classifier.qza)
string
Name of supported database, and optionally also version number
string
Path to a custom Kraken2 reference taxonomy database (.tar.gz|.tgz archive or folder)
string
Comma separated list of taxonomic levels used in Kraken2. Will overwrite default values.
string
Confidence score threshold for taxonomic classification.
number
Name of supported database, and optionally also version number
string
If ASVs should be assigned to UNITE species hypotheses (SHs). Only relevant for ITS data.
boolean
Part of ITS region to use for taxonomy assignment: “full”, “its1”, or “its2”
string
Cutoff for partial ITS sequences. Only full sequences by default.
integer
Choose database for taxonomic assignments with multi-region amplicons using SIDLE
Name of supported database, and optionally also version number
string
Path to reference taxonomy strings (headerless, *.txt)
string
^.*\.txt$
Path to reference taxonomy sequences in fasta format
string
^.*\.(fasta|fas|fna|fa|ffn)$
Path to multiple sequence alignment of reference taxonomy sequences in fasta format
string
^.*\.(fasta|fas|fna|fa|ffn)$
Path to SIDLE reference taxonomy tree (*.qza)
string
^.*\.qza$
Exclude reference sequences with more than this much degenerates
integer
5
Arguments for qiime sidle reconstruct-taxonomy
regarding ad-hoc cleaning
string
Filtering by taxonomy or abundance will affect all downstream analysis
Comma separated list of unwanted taxa, to skip taxa filtering use “none”
string
mitochondria,chloroplast
Abundance filtering
integer
1
Prevalence filtering
integer
1
Metadata is used here to visualize data either for quality control or publication ready figures
Comma separated list of metadata column headers for statistics.
string
Comma separated list of metadata column headers for plotting average relative abundance barplots.
string
Formula for QIIME2 ADONIS metadata feature importance test for beta diversity distances
string
If the functional potential of the bacterial community is predicted.
boolean
If data should be exported in SBDI (Swedish biodiversity infrastructure) Excel format.
boolean
Minimum rarefaction depth for diversity analysis. Any sample below that threshold will be removed.
integer
500
Minimum taxonomy agglomeration level for taxonomic classifications
integer
2
Maximum taxonomy agglomeration level for taxonomic classifications
integer
6
Differential abundance analysis relies on provided metadata
Minimum sample counts to retain a sample for ANCOM analysis. Any sample below that threshold will be removed.
integer
1
Perform differential abundance analysis with ANCOM
boolean
Perform differential abundance analysis with ANCOMBC
boolean
Formula to perform differential abundance analysis with ANCOMBC
string
Reference level for --ancombc_formula
string
Effect size threshold for differential abundance barplot for --ancombc
and --ancombc_formula
number
1
Significance threshold for differential abundance barplot for --ancombc
and --ancombc_formula
number
0.05
Customization of the pipeline report
Path to Markdown file (Rmd)
string
${projectDir}/assets/report_template.Rmd
Path to style file (css)
string
${projectDir}/assets/nf-core_style.css
Path to logo file (png)
string
${projectDir}/assets/nf-core-ampliseq_logo_light_long.png
String used as report title
string
Summary of analysis results
Path to Markdown file (md) that replaces the ‘Abstract’ section
string
Skip FastQC
boolean
Skip primer trimming with cutadapt. This is not recommended! Use only in case primer sequences were removed before and the data does not contain any primer sequences.
boolean
Skip quality check with DADA2. Can only be skipped when --trunclenf
and --trunclenr
are set.
boolean
Skip annotating SSU matches.
boolean
Skip all steps that are executed by QIIME2, including QIIME2 software download, taxonomy assignment by QIIME2, barplots, relative abundance tables, diversity analysis, differential abundance testing.
boolean
Skip steps that are executed by QIIME2 except for taxonomic classification. Skip steps including barplots, relative abundance tables, diversity analysis, differential abundance testing.
boolean
Skip taxonomic classification. Incompatible with --sbdiexport
boolean
Skip taxonomic classification with DADA2
boolean
Skip species level when using DADA2 for taxonomic classification. This reduces the required memory dramatically under certain conditions. Incompatible with --sbdiexport
boolean
Skip producing barplot
boolean
Skip producing any relative abundance tables
boolean
Skip alpha rarefaction
boolean
Skip alpha and beta diversity analysis
boolean
Skip exporting phyloseq rds object(s)
boolean
Skip exporting TreeSummarizedExperiment rds object(s)
boolean
Skip MultiQC reporting
boolean
Skip Markdown summary report
boolean
Less common options for the pipeline, typically set in a config file.
Specifies the random seed.
integer
100
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string