nf-core/phageannotator
Pipeline for identifying, annotation, and quantifying phage sequences in (meta)-genomic sequences.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Filter assemblies at the beginning of the workflow
Minimum assembly length
integer
1000
Run ViromeQC to estimate viral enrichment
boolean
Identify reference viruses contained in reads
Run MASH screen to identify external viruses contained in reads
boolean
Path to FASTA file containing reference virus sequences
string
Path to mash sketch file for reference virus sequences
string
Save reference virus sketch, if it was created.
boolean
Minimum mash screen score to consider a genome contained
number
0.95
Hashes present in multiple references are assigned only to top sequence
boolean
Classify viral sequences using geNomad
Skip running geNomad to classify viral/non-viral sequences
boolean
Path to directory containing geNomad’s database
string
Save geNomad’s database, if it was downloaded.
boolean
Minimum virus score for a sequence to be considered viral
number
0.7
Maximum FDR for a sequence to be considered viral (will include —enable-score-calibration)
number
0.1
Number of splits for running geNomad (more splits lowers memory requirements)
integer
5
Extend viral contigs
Run COBRA to extend viral contigs
boolean
The assembler that was used to assemble viral contigs
string
Minimum kmer value used during assembly
string
Maximum kmer value used during assembly
string
Assess virus quality and filter
Skip running CheckV to assess virus quality and filter sequences
boolean
Path to directory containing CheckV database
string
Save CheckV’s database, if it was downloaded
boolean
Minimum virus length to pass filtering
integer
3000
Minimum CheckV completeness to pass filtering
integer
50
Remove viruses labeled as provirus by geNomad or CheckV
boolean
Remove viruses with CheckV warnings
boolean
Cluster virus genomes based on nucleotide/protein similarity
Skip ANI-based virus clustering
boolean
Minimum precent identity for BLAST hits
integer
90
Maximum number of BLAST hits to record for each sequence
integer
25000
Minimum average nucleotide identity (ANI) for sequences to be clustered together
integer
95
Minimum query coverage for sequences to be clustered together
integer
Minimum test coverage for sequences to be clustered together
integer
85
Align reads to virus database
Skip read alignment to viral sequences
boolean
Minimum length of reads aligned to references
integer
Minimum percent identity of aligned reads
integer
Minimum percent of read aligned to references
integer
Abundance calculation metrics
string
mean
Assign taxonomy to virus sequences
boolean
Predict host genus for phage sequences
Run iPHoP to predict phage hosts
boolean
Path to locally iPHoP database
string
Save downloaded iPHoP database
boolean
Minimum confidence score to provide host prediction
integer
90
Predict the lifestyle of viral sequences
Run BACPHLIP to predict virus lifestyle
boolean
Functionally annotate viral genomes using a variety of approaches
Run pharokka to predict and annotate phage ORFs
boolean
Path to predownloaded pharokka db
string
Analyze virus diversity at the strain level
Bypass microdiversity analysis with inStrain
boolean
Minimum identity for read alignment to be considered
number
Minimum MAPQ for a read to be considered
integer
Minimum coverage for a variant to be considered
integer
Minimum allele frequency for an SNP to be considered
number
Maximum FDR for a SNP to be considered
integer
Minimum number of reads mapping to a genome to consider profiling
number
Minimum identity for genomes to be considered in the same strain
number
Minimum percent of genomes compared for comparison to be considered
number
Minimum breadth of coverage for a genome to be considered present
number
Arguments for running pipeline tests with custom arguments/databases.
boolean
number
boolean
Download test database rather than full database?
boolean
boolean
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Maximum amount of memory that can be requested for any single job.
string
128.GB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Maximum amount of time that can be requested for any single job.
string
240.h
^(\d+\.?\s*(s|m|h|d|day)\s*)+$
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Use logo in initialise subworkflow
boolean
true
Show all params when using --help
boolean
Validation of parameters fails when an unrecognised parameter is found.
boolean
Validation of parameters in lenient more.
boolean