Please read the following documentation prior to the discussion:
- VariantRecalibrator overview for GATK v4.1.2.0
- Variant Quality Score Recalibration (VQSR) and how it works (includes minimum number of samples required)
- How to filter variants either with VQSR or by hard-filtering
- Variant Quality Score Recalibration (VQSR)
- The variant calling section of Samtools workflows documentation
If you are unfamiliar with scikit-allel, please spend a few minutes familiarizing yourself with the tool. Documentation is available here: https://scikit-allel.readthedocs.io/en/stable/. Specifically, the following tutorials are relevant:
- Tutorial on filtering variants: http://alimanfoo.github.io/2018/04/09/selecting-variants.html
- Tutorial on exploratory analyses for VCF files: http://alimanfoo.github.io/2016/06/10/scikit-allel-tour.html
NOTE: During the discussion we will be looking at some Jupyter notebooks from the Morrell Lab that uses scikit-allel to explore VCF files to identify a set of VCF filtering criteria. We will also take a brief look at plots of bcftools stats
.