Going further¶
You can work on improving the workflow or on answering a biological question.
Workflow improvement¶
Use the Snakemake documentation webpage for help!
Logging¶
Add logging to one or more rules, to capture stdout and stderr in files.
Benchmarking¶
Snakemake can measure CPU/wall clock time and RAM use of each rule.
Find out how, and try it out on a rule.
Restarts¶
What does Snakemake rely on to know where in the DAG to restart after a failed run?
Tip
Search the doc for timestamps
.
Try modifying a file yourself such that Snakemake will re-run the workflow from rule
mergeVcfs
.Find out the command-line option to re-run the workflow from any user-specified rule. relevant doc
Cluster¶
Figure out how to submit the workflow to the cluster. Note that cluster parameters should not go in the workflow itself, otherwise it is no longer independent of where it is run. relevant doc
For testing, ask us if there is cluster access, or if we can run it for you on EBI cluster (which uses LSF).
DAG¶
Find out how to produce the DAG representing the workflow (snakemake
can do this).
What is an alternative representation if the DAG is too crowded?
Analysis improvement: Drug resistance prediction¶
The dataset contains at least one sample which is resistant to a drug against tuberculosis (TB). Can you find which samples are resistant to which known TB drugs?
You can use the mykrobe program to do this.
Check the drug resistance predictions by mykrobe are present in the VCFs you produced using the workflow.
Running mykrobe and making a report can be added to the workflow.
Tip
Use the -f option in mykrobe else it will not predict anything. This is because we are working with a slice of the genome.