nf-core/pangenome
Renders a collection of sequences into a pangenome graph. https://doi.org/10.1093/bioinformatics/btae609.
Define where the pipeline should find input data and save output data.
Path to BGZIPPED input FASTA to build the pangenome graph from.
string
^\S+\.fn?a(sta)?(\.gz)?$
The number of haplotypes in the input FASTA.
number
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Options for the all versus all alignment phase.
Percent identity in the wfmash mashmap step.
number
90
Segment length for mapping.
string
5000
Minimum block length filter for mapping.
string
Kmer size for mashmap.
integer
19
Ignore the top % most-frequent kmers.
number
0.001
Keep this fraction of mappings (auto
for giant component heuristic).
string
1.0
(auto|[01]\.\d+)
Merge successive mappings.
boolean
Disable splitting of input sequences during mapping.
boolean
Skip mappings between sequences with the same name prefix before the given delimiter character. This can be helpful if several sequences originate from the same chromosome. It is recommended that the sequence names respect the https://github.com/pangenome/PanSN-spec. In future versions of the pipeline it will be required that the sequence names follow this specification.
string
Set the directory where temporary files should be stored. Since everything runs in containers, we don’t usually set this argument.
string
The number of files to generate from the approximate wfmash mappings to scale across a whole cluster. It is recommended to set this to the number of available nodes. If only one machine is available, leave it at 1.
integer
1
If this parameter is set, only the wfmash alignment step of the pipeline is executed. This option is offered for users who want to run wfmash on a cluster.
boolean
Filter out mappings unlikely to be this Average Nucleotide Identity (ANI) less than the best mapping.
integer
30
Number of mappings for each segment. [default: n_haplotypes - 1
].
integer
Ignores exact matches below this length.
integer
23
Number of base pairs to use for transitive closure batch.
string
10000000
Keep this randomly selected fraction of input matches.
number
Set the directory where temporary files should be stored. Since everything runs in containers, we don’t usually set this argument.
string
Input PAF file. The wfmash alignment step is skipped.
string
Options for graph smoothing phase.
Skip the graph smoothing step of the pipeline.
boolean
Maximum path jump to include in the block.
integer
Maximum edge jump before a block is broken.
integer
Maximum sequence length to put int POA. Is a comma-separated list. For each integer, SMOOTHXG wil be executed once.
string
700,900,1100
Minimum edit-based identity to cluster sequences.
string
Minimum ‘smallest / largest’ sequence length ration to cluster in a block.
integer
Path depth at which we don’t pad the POA problem.
integer
100
Pad each end of each seuqence in POA with ‘smoothxg_poa_padding * longest_poa_seq’ base pairs.
number
0.001
Score parameters for POA in the form of ‘match,mismatch,gap1,ext1,gap2,ext2’. It may also be given as presets: ‘asm5’, ‘asm10’, ‘asm15’, ‘asm20’. [default: 1,19,39,3,81,1 = asm5].
string
1,19,39,3,81,1
Write MAF output representing merged POA blocks.
boolean
Use this prefix for consensus path names.
string
Consensus_
Set the directory where temporary files should be stored. Since everything runs in containers, we don’t usually set this argument.
string
Keep intermediate graphs during SMOOTHXG.
boolean
Run abPOA. [default: SPOA].
boolean
Run the POA in global mode. [default: local mode].
boolean
Number of CPUs for the potentially very memory expensive POA phase of SMOOTHXG. Default is ‘task.cpus’.
integer
Options for calling variants against reference(s).
Specify a set of VCFs to produce with --vcf_spec "REF[:LEN][,REF[:LEN]]*"
.
string
Options to run the partition algorithm for community detection.
Enable community detection.
boolean
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/
Do we want to display hidden parameters?
boolean
Do we want to display hidden parameters?
string
igenomes_base
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string