As any bioinformatician knows, there are few things more frustrating than trying to understand how to use someone else's program. I struggled with this myself while working on this package. However, in the realm of scientific research, we must learn to appreciate the stringency of our frequently used tools. I will not tell you to ignore the various warnings and errors produced by VariTAS in this vignette, because they are essential to ensure that the pipeline produces statistically robust, reproducible results.
That being said, I empathise with the frustration of trying to use a new tool only to be met with a barrage of errors and incompatible data. So to minimise the amount of time you have to spend interpreting laconic error messages and resubmitting processes, I have written this guide. I hope that it helps to explain why these errors are thrown and more importantly, how to make them go away.
These are errors thrown when the pipeline is verifying the various options and parameters submitted to it through the config file. This includes a number of 'file ____ does not exist'-type errors that I have omitted for what I hope are obvious reasons.
An incompatible stage has been submitted to the main pipeline function. The only supported stages are 'alignment', 'qc', 'calling', 'annotation', and 'merging'.
Ensure that the start.stage
parameter is set to one of the allowed stages.
varitas.options
must be a list of options or a string giving the path to the config YAML fileWhatever you have tried to use as the VariTAS options file is incorrect. You shouldn't see this error if you're following the template in the Introduction vignette.
Ensure that you are pointing to the correct file when submitting it to overwrite.varitas.options
. It should be based on the config.yaml
file contained in the inst
directory of this package.
reference_build
There must be a reference_build
parameter set somewhere in the config file so that the script knows which version of the genome you are using. This setting is present in the config.yaml
file found in the inst
directory of this package.
Add a parameter to the config file called reference_build
and make sure it's set to either 'grch37' or 'grch38' (anything else will cause you to run into the next error).
reference_build
must be either grch37 or grch38The reference_build
parameter in the config file can only be set to either 'grch37' or 'grch38', which are the two versions of the human genome supported by the pipeline. See also the previous error.
Ensure that reference_build
is set to your version of the genome, in the form of either 'grch37' or 'grch38'.
Only reference genomes in the FASTA format are supported by the various tools used in this pipeline. Of course, your genome might already be in FASTA format with a different file extension, but it's better to be sure.
Use a reference in FASTA format with the .fa or .fasta file extension.
target_panel
must be provided for alignment and variant calling stagesAs VariTAS is meant to be run on data from amplicon sequencing experiments, some of the stages require a file detailing the target panel. This should be in the form of a BED file, the format of which is described here.
Ensure that you have a properly formatted BED file supplied as the target_panel
parameter in the config file.
Followed by “Reference genome chromosomes: ____ Target panel chromosomes: ____”. This error probably looks familiar if you've ever had the great priviledge of working with GATK. Essentially, the chromosomes listed in your target panel don't match up with those in the reference genome. In practice, it means you have one or more chromosomes in the target panel that are not in the reference.
This issue can arise from a few different places, so be sure to check that it's not something very simple first.
This issue and the next two are related to preparing the reference genome file. Various tools require that large FASTA files are indexed and have sequence dictionaries so that they can be parsed quickly. Once you fix these issues, they shouldn't come up again as long as the index files are in the same directory as the reference.
Run bwa index
on the indicated file.
See above
Run gatk CreateSequenceDictionary
on the indicated file.
See above (x2)
Run samtools faidx
on the indicated file.