Going further

You can work on improving the workflow or on answering a biological question.

Workflow improvement

Use the Snakemake documentation webpage for help!

Logging

Add logging to one or more rules, to capture stdout and stderr in files.

Benchmarking

Snakemake can measure CPU/wall clock time and RAM use of each rule.

Find out how, and try it out on a rule.

Restarts

What does Snakemake rely on to know where in the DAG to restart after a failed run?

Tip

Search the doc for timestamps.

  • Try modifying a file yourself such that Snakemake will re-run the workflow from rule mergeVcfs.

  • Find out the command-line option to re-run the workflow from any user-specified rule. relevant doc

Cluster

Figure out how to submit the workflow to the cluster. Note that cluster parameters should not go in the workflow itself, otherwise it is no longer independent of where it is run. relevant doc

For testing, ask us if there is cluster access, or if we can run it for you on EBI cluster (which uses LSF).

DAG

Find out how to produce the DAG representing the workflow (snakemake can do this).

What is an alternative representation if the DAG is too crowded?

Analysis improvement: Drug resistance prediction

The dataset contains at least one sample which is resistant to a drug against tuberculosis (TB). Can you find which samples are resistant to which known TB drugs?

You can use the mykrobe program to do this.

Check the drug resistance predictions by mykrobe are present in the VCFs you produced using the workflow.

Running mykrobe and making a report can be added to the workflow.

Tip

Use the -f option in mykrobe else it will not predict anything. This is because we are working with a slice of the genome.