Command Line Interface
To run any of the five Snekmer operation modes, simply call:
snekmer {mode}
Each mode has its own mode-specific options and parameters to be specified
on the command line or the config.yaml
file, respectively.
For an overview of Snekmer usage, reference the help command (snekmer --help
).
$ snekmer --help
usage: snekmer [-h] [-v] {cluster,model,search,learn,apply} ...
Snekmer: A tool for kmer-based sequence analysis using amino acid reduction (AAR)
options:
-h, --help show this help message and exit
-v, --version print version and exit
mode:
Snekmer mode
{cluster,model,search,learn,apply}
Tailored references for the individual operation modes can be accessed
via snekmer {mode} --help
.
Configuration
To run Snekmer, create a config.yaml
file containing desired
parameters. A template
is included in the repository. Note that a config file must be
included, in the same directory as input directory, for Snekmer
to operate.
Snekmer assumes that input files are stored in the input
directory,
and automatically creates an output
directory to save all output
files. Snekmer also assumes background files, if any, are stored in
input/background
. An example of the assumed directory structure
is shown below:
Snekmer cluster
, model
, and search
input
.
├── config.yaml
├── input/
│ ├── background/
│ │ ├── X.fasta
│ │ ├── Y.fasta
│ │ └── etc.
│ ├── A.fasta
│ ├── B.fasta
│ └── etc.
├── output/
│ ├── ...
│ └── ...
Snekmer learn
input
.
├── config.yaml
├── input/
│ ├── A.fasta # known sequences to "learn" kmer counts matrix from
│ ├── B.fasta # known sequences to "learn" kmer counts matrix from
│ └── etc.
│ └── base/ # optional
│ └── base-kmer-counts.csv # optional file to additively merge kmer counts with
├── annotations/
│ └── annotations.ann # annotation files used for predicting future sequences
├── output/
│ ├── ...
│ └── ...
Snekmer apply
input
.
├── config.yaml
├── input/
│ ├── A.fasta # unknown sequences to "apply" kmer counts matrix on
│ ├── B.fasta # unknown sequences to "apply" kmer counts matrix on
│ └── etc.
├── counts/
│ └── kmer-counts-total.csv #kmer counts matrix generated in ``learn``
├── confidence/
│ └── global-confidence-scores.csv #global confidence distribution generated in ``learn``
├── output/
│ ├── ...
│ └── ...
Partial Workflow
To execute only a part of the workflow, the --until
option can be invoked.
For instance, to execute the workflow only through the kmer vector generation
step, run:
snekmer {mode} --until vectorize
All Options
Snekmer: A tool for kmer-based sequence analysis using amino acid reduction (AAR).
usage: snekmer [-h] [-v] {cluster,model,search,learn,apply} ...
Named Arguments
- -v, --version
Print version and exit.
mode
Snekmer mode (cluster, model, search, learn, or apply).
- mode
Possible choices: cluster, model, search, learn, apply
Sub-commands
cluster
Apply unsupervised clustering via Snekmer.
snekmer cluster [-h] [--dry-run] [--configfile PATH [PATH ...]]
[-C [KEY=VALUE ...]] [--unlock] [--until TARGET [TARGET ...]]
[--keepgoing] [--latency SECONDS] [--touch] [--cores N]
[--count N] [--countstart IDX] [--verbose]
[--quiet [{progress,rules,all} ...]] [--directory DIR]
[--forcerun [TARGET ...]] [--list-code-changes]
[--list-params-changes] [--clust PATH [PATH ...]] [-j N]
Named Arguments
- --dry-run, --dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
Default: False
- --configfile
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first).
- -C, --config
Set or overwrite values in the workflow config object. The workflow config object is accessible as variable config inside the workflow. Default values can be set by providing a JSON file.
- --unlock
unlock directory
Default: False
- --until, -U
Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
- --keepgoing, --keep-going, -k
Go on with independent jobs if a job fails.
Default: False
- --latency, -w, --output-wait, --latency-wait
Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency (default 30).
Default: 30
- --touch, -t
Touch output files (mark them up to date without really changing them) instead of running their commands. This is used to pretend that the rules were executed, in order to fool future invocations of snakemake. Fails if a file does not yet exist. Note that this will only touch files that would otherwise be recreated by Snakemake (e.g. because their input files are newer). For enforcing a touch, combine this with –force, –forceall, or –forcerun. Note however that you loose the provenance information when the files have been created in realitiy. Hence, this should be used only as a last resort.
Default: False
- --cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the maximum number of cores requested from the cluster or cloud scheduler. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info)This number is available to rules via workflow.cores.
Default: 2
- --count
Number of files to process (limits DAG size).
- --countstart
Starting file index (for use with –count).
Default: 0
- --verbose
Show additional debug output (default False)
Default: False
- --quiet, -q
Possible choices: progress, rules, all
Do not output certain information. If used without arguments, do not output any progress or rule information. Defining ‘all’ results in no information being printed at all.
- --directory, -d
Specify working directory (relative paths in the snakefile will use this as their origin).
- --forcerun, -R
Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
- --list-code-changes, --lc
List all output files for which the rule body (run or shell) have changed in the Snakefile.
Default: False
- --list-params-changes, --lp
List all output files for which the defined params have changed in the Snakefile.
Default: False
Cluster Execution Arguments
- --clust
Path to cluster execution yaml configuration file.
- -j, --jobs
Number of simultaneous jobs to submit to a slurm queue.
Default: 1000
model
Train supervised models via Snekmer.
snekmer model [-h] [--dry-run] [--configfile PATH [PATH ...]]
[-C [KEY=VALUE ...]] [--unlock] [--until TARGET [TARGET ...]]
[--keepgoing] [--latency SECONDS] [--touch] [--cores N]
[--count N] [--countstart IDX] [--verbose]
[--quiet [{progress,rules,all} ...]] [--directory DIR]
[--forcerun [TARGET ...]] [--list-code-changes]
[--list-params-changes] [--clust PATH [PATH ...]] [-j N]
Named Arguments
- --dry-run, --dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
Default: False
- --configfile
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first).
- -C, --config
Set or overwrite values in the workflow config object. The workflow config object is accessible as variable config inside the workflow. Default values can be set by providing a JSON file.
- --unlock
unlock directory
Default: False
- --until, -U
Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
- --keepgoing, --keep-going, -k
Go on with independent jobs if a job fails.
Default: False
- --latency, -w, --output-wait, --latency-wait
Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency (default 30).
Default: 30
- --touch, -t
Touch output files (mark them up to date without really changing them) instead of running their commands. This is used to pretend that the rules were executed, in order to fool future invocations of snakemake. Fails if a file does not yet exist. Note that this will only touch files that would otherwise be recreated by Snakemake (e.g. because their input files are newer). For enforcing a touch, combine this with –force, –forceall, or –forcerun. Note however that you loose the provenance information when the files have been created in realitiy. Hence, this should be used only as a last resort.
Default: False
- --cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the maximum number of cores requested from the cluster or cloud scheduler. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info)This number is available to rules via workflow.cores.
Default: 2
- --count
Number of files to process (limits DAG size).
- --countstart
Starting file index (for use with –count).
Default: 0
- --verbose
Show additional debug output (default False)
Default: False
- --quiet, -q
Possible choices: progress, rules, all
Do not output certain information. If used without arguments, do not output any progress or rule information. Defining ‘all’ results in no information being printed at all.
- --directory, -d
Specify working directory (relative paths in the snakefile will use this as their origin).
- --forcerun, -R
Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
- --list-code-changes, --lc
List all output files for which the rule body (run or shell) have changed in the Snakefile.
Default: False
- --list-params-changes, --lp
List all output files for which the defined params have changed in the Snakefile.
Default: False
Cluster Execution Arguments
- --clust
Path to cluster execution yaml configuration file.
- -j, --jobs
Number of simultaneous jobs to submit to a slurm queue.
Default: 1000
search
Search sequences against pre-existing models via Snekmer.
snekmer search [-h] [--dry-run] [--configfile PATH [PATH ...]]
[-C [KEY=VALUE ...]] [--unlock] [--until TARGET [TARGET ...]]
[--keepgoing] [--latency SECONDS] [--touch] [--cores N]
[--count N] [--countstart IDX] [--verbose]
[--quiet [{progress,rules,all} ...]] [--directory DIR]
[--forcerun [TARGET ...]] [--list-code-changes]
[--list-params-changes] [--clust PATH [PATH ...]] [-j N]
Named Arguments
- --dry-run, --dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
Default: False
- --configfile
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first).
- -C, --config
Set or overwrite values in the workflow config object. The workflow config object is accessible as variable config inside the workflow. Default values can be set by providing a JSON file.
- --unlock
unlock directory
Default: False
- --until, -U
Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
- --keepgoing, --keep-going, -k
Go on with independent jobs if a job fails.
Default: False
- --latency, -w, --output-wait, --latency-wait
Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency (default 30).
Default: 30
- --touch, -t
Touch output files (mark them up to date without really changing them) instead of running their commands. This is used to pretend that the rules were executed, in order to fool future invocations of snakemake. Fails if a file does not yet exist. Note that this will only touch files that would otherwise be recreated by Snakemake (e.g. because their input files are newer). For enforcing a touch, combine this with –force, –forceall, or –forcerun. Note however that you loose the provenance information when the files have been created in realitiy. Hence, this should be used only as a last resort.
Default: False
- --cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the maximum number of cores requested from the cluster or cloud scheduler. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info)This number is available to rules via workflow.cores.
Default: 2
- --count
Number of files to process (limits DAG size).
- --countstart
Starting file index (for use with –count).
Default: 0
- --verbose
Show additional debug output (default False)
Default: False
- --quiet, -q
Possible choices: progress, rules, all
Do not output certain information. If used without arguments, do not output any progress or rule information. Defining ‘all’ results in no information being printed at all.
- --directory, -d
Specify working directory (relative paths in the snakefile will use this as their origin).
- --forcerun, -R
Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
- --list-code-changes, --lc
List all output files for which the rule body (run or shell) have changed in the Snakefile.
Default: False
- --list-params-changes, --lp
List all output files for which the defined params have changed in the Snakefile.
Default: False
Cluster Execution Arguments
- --clust
Path to cluster execution yaml configuration file.
- -j, --jobs
Number of simultaneous jobs to submit to a slurm queue.
Default: 1000
learn
Learn kmer-annotation associations via Snekmer
snekmer learn [-h] [--dry-run] [--configfile PATH [PATH ...]]
[-C [KEY=VALUE ...]] [--unlock] [--until TARGET [TARGET ...]]
[--keepgoing] [--latency SECONDS] [--touch] [--cores N]
[--count N] [--countstart IDX] [--verbose]
[--quiet [{progress,rules,all} ...]] [--directory DIR]
[--forcerun [TARGET ...]] [--list-code-changes]
[--list-params-changes] [--clust PATH [PATH ...]] [-j N]
Named Arguments
- --dry-run, --dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
Default: False
- --configfile
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first).
- -C, --config
Set or overwrite values in the workflow config object. The workflow config object is accessible as variable config inside the workflow. Default values can be set by providing a JSON file.
- --unlock
unlock directory
Default: False
- --until, -U
Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
- --keepgoing, --keep-going, -k
Go on with independent jobs if a job fails.
Default: False
- --latency, -w, --output-wait, --latency-wait
Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency (default 30).
Default: 30
- --touch, -t
Touch output files (mark them up to date without really changing them) instead of running their commands. This is used to pretend that the rules were executed, in order to fool future invocations of snakemake. Fails if a file does not yet exist. Note that this will only touch files that would otherwise be recreated by Snakemake (e.g. because their input files are newer). For enforcing a touch, combine this with –force, –forceall, or –forcerun. Note however that you loose the provenance information when the files have been created in realitiy. Hence, this should be used only as a last resort.
Default: False
- --cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the maximum number of cores requested from the cluster or cloud scheduler. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info)This number is available to rules via workflow.cores.
Default: 2
- --count
Number of files to process (limits DAG size).
- --countstart
Starting file index (for use with –count).
Default: 0
- --verbose
Show additional debug output (default False)
Default: False
- --quiet, -q
Possible choices: progress, rules, all
Do not output certain information. If used without arguments, do not output any progress or rule information. Defining ‘all’ results in no information being printed at all.
- --directory, -d
Specify working directory (relative paths in the snakefile will use this as their origin).
- --forcerun, -R
Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
- --list-code-changes, --lc
List all output files for which the rule body (run or shell) have changed in the Snakefile.
Default: False
- --list-params-changes, --lp
List all output files for which the defined params have changed in the Snakefile.
Default: False
Cluster Execution Arguments
- --clust
Path to cluster execution yaml configuration file.
- -j, --jobs
Number of simultaneous jobs to submit to a slurm queue.
Default: 1000
apply
Apply kmer-annotation associations via Snekmer
snekmer apply [-h] [--dry-run] [--configfile PATH [PATH ...]]
[-C [KEY=VALUE ...]] [--unlock] [--until TARGET [TARGET ...]]
[--keepgoing] [--latency SECONDS] [--touch] [--cores N]
[--count N] [--countstart IDX] [--verbose]
[--quiet [{progress,rules,all} ...]] [--directory DIR]
[--forcerun [TARGET ...]] [--list-code-changes]
[--list-params-changes] [--clust PATH [PATH ...]] [-j N]
Named Arguments
- --dry-run, --dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
Default: False
- --configfile
Specify or overwrite the config file of the workflow (see the docs). Values specified in JSON or YAML format are available in the global config dictionary inside the workflow. Multiple files overwrite each other in the given order. Thereby missing keys in previous config files are extended by following configfiles. Note that this order also includes a config file defined in the workflow definition itself (which will come first).
- -C, --config
Set or overwrite values in the workflow config object. The workflow config object is accessible as variable config inside the workflow. Default values can be set by providing a JSON file.
- --unlock
unlock directory
Default: False
- --until, -U
Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
- --keepgoing, --keep-going, -k
Go on with independent jobs if a job fails.
Default: False
- --latency, -w, --output-wait, --latency-wait
Wait given seconds if an output file of a job is not present after the job finished. This helps if your filesystem suffers from latency (default 30).
Default: 30
- --touch, -t
Touch output files (mark them up to date without really changing them) instead of running their commands. This is used to pretend that the rules were executed, in order to fool future invocations of snakemake. Fails if a file does not yet exist. Note that this will only touch files that would otherwise be recreated by Snakemake (e.g. because their input files are newer). For enforcing a touch, combine this with –force, –forceall, or –forcerun. Note however that you loose the provenance information when the files have been created in realitiy. Hence, this should be used only as a last resort.
Default: False
- --cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the maximum number of cores requested from the cluster or cloud scheduler. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources-remote-execution for more info)This number is available to rules via workflow.cores.
Default: 2
- --count
Number of files to process (limits DAG size).
- --countstart
Starting file index (for use with –count).
Default: 0
- --verbose
Show additional debug output (default False)
Default: False
- --quiet, -q
Possible choices: progress, rules, all
Do not output certain information. If used without arguments, do not output any progress or rule information. Defining ‘all’ results in no information being printed at all.
- --directory, -d
Specify working directory (relative paths in the snakefile will use this as their origin).
- --forcerun, -R
Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
- --list-code-changes, --lc
List all output files for which the rule body (run or shell) have changed in the Snakefile.
Default: False
- --list-params-changes, --lp
List all output files for which the defined params have changed in the Snakefile.
Default: False
Cluster Execution Arguments
- --clust
Path to cluster execution yaml configuration file.
- -j, --jobs
Number of simultaneous jobs to submit to a slurm queue.
Default: 1000