The Estonian Genome Centre at the University of Tartu has done a considerable job with Translating genotype data of 44,000 biobank participants into clinical pharmacogenetic recommendations.
|High density microarrays
||HumanOmniExpress beadchip (OMNI, 8132 patients) and Global Screening Array (GSA, Illumina, 33157 patients), GenomeStudio (Illumina, genotyping, filtering for GSA), PLINK (filtering for all), zCall (genotyping rare variants for GSA), Eagle2 (phasing), Beagle (impuation, population specific imputation panel from WGS)
||1308 of these patients were also Whole genome sequenced
|Whole genome sequencing
||TruSeq PCR-free prep, Illumina HiSeq X (150bp paired-end, 30x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition), Genome STRiP (CNV calls for CYP2D6, 2269 patients), Astrolabe (allele matching for CYP2D6, for comparison)
||Quality filtering parameters are given in the article. The WGS samples (with some modifications) were also merged into a reference panel used for imputation (total 2279 Estonians and 1856 Finns). Cf. Mitt et al.
|Whole exome sequencing
||Agilent SureSelect Human All Exon V5+UTRs target capture kit, HiSeq2500 (67x mean coverage), BWA-MEM (GRCh37 reference genome), Picard (mark PCR duplicates), GATK 3.4, bcftools (normalization and decomposition)
Challenges and solutions
||Pruning of allele definitions (removing variants from allele definitions (i.e. only keeping variants that destroys the protein), removing alleles with unknown function)
||The allele pruning also makes it more likely that patients have normal phenotype (instead of unknown phenotype), removing most sources to alleles with unknown function
||SNP2HLA tool (WGS only)
||A review of HLA-typing methods from Bauer et al. does not mention this tool, but SNP2HLA is provided by the Broad Institute, so it should be good.
|Multiple allele matches
||Made hierarchy of alleles based on the biochemical function (No function > Decreased Function > Other functional statuses)
||Probably this can be seen as a variant of the best solution to the unknown function problem: Look for the most serious consequence, and if no allele with serious consequence was found, assume Normal function. In case there were more than one star allele match per haplotype, they matched all possible star allele diplotypes, and picked the diplotype with the most serious clinical consequence
||Haplotype estimation for WGS was performed, but it is unclear which method was used. Probably the methodology is similar to that used in Mitt et al., in which case they used SHAPEIT2. Otherwise Eagle2 (as for microarray data) which is 6 times faster.
||In general, the difference between haplotyping and PGx allele matching it not clear (maybe right to say that PGx allele matching is a subset of general haplotyping?).
||Combination of Genome STRiP and normal allele matching (favorable comparison to Astrolabe used by PharmCAT)
||Did not understand exactly how they did it (maybe check out reference by Gaedigk et al.)
Take home messages
- Haplotype calling essential
- Prefiltering (pruning) of the allele definition tables provided by PharmGKB
- Rare variants (< 1% minor allele frequency) account for 89% of all (different kinds of) deleterious mutations (affect 30-40% of patients with non-normal allele function according to Lauschke et al.)
- Rare variants should only be used for research
- Multiple star alleles are for some genes expected on same haplotype. Suggestion: look for the functional effect of variants within star alleles instead of looking for star alleles, making decision trees that prioritize variants
- WES is not good enough for PGx, unless adding customized probes (which is generally more expensive than a pure microarray approach)
- Mircoarrays with impuation of unknown variants is cost-effective approach to PGx
- WGS has similar quality as microarrays. In addition WGS allows for HLA-calling and finds additional variants that are as yet not actionable