Difference between revisions of "Allele definition"

m (How to define PGx alleles)
Line 1: Line 1:
==How to define PGx alleles==
+
==How to obtain PGx allele definitions from literature==
 +
PGx alleles are collected and distributed through various channels
 +
* PGx alleles on JSON-LD format from the [https://api.pharmgkb.org PharmGKB API]
 +
* PGx alleles on Excel-style formats from the PGRN Translational Pharmacogenetics Project (TPP), published through the [https://api.pharmgkb.org PharmGKB API]
 +
* PGx alleles as VCF files from [https://www.pharmvar.org PharmVar]
 +
PGx allele definitions are given in either GRCh37 or GRCh38 reference coordinates. The process of changing from GRCh37 to GRCh38 for PharmGKB is only partially completed.
 +
We suspect that there are serious discrepancies between PharmVar and PharmGKB sources (which is strange as PharmGKB gets data form PharmVar).
 +
 
 +
==How to define PGx alleles for next generation sequencing==
 
PGx alleles are defined as collections of one or more SNPs, INDELs or structural variants. When a patient is sequenced by next generation sequencing ([[NGS]]) we may typically observe more variants than those which are included in any individual PGx allele definitions.  
 
PGx alleles are defined as collections of one or more SNPs, INDELs or structural variants. When a patient is sequenced by next generation sequencing ([[NGS]]) we may typically observe more variants than those which are included in any individual PGx allele definitions.  
 
[[File:Variant tree outline.png|thumb|The 16 possible haplotypes for a four loci, decomposed variant calling]]
 
[[File:Variant tree outline.png|thumb|The 16 possible haplotypes for a four loci, decomposed variant calling]]
Line 7: Line 15:
 
We illustrate some of the problems that we encountered when trying to match patient haplotypes to the PGx allele definitions, by a four loci PGx gene
 
We illustrate some of the problems that we encountered when trying to match patient haplotypes to the PGx allele definitions, by a four loci PGx gene
  
==The SNP array method==
+
===The SNP array method===
 
[[File:Variant tree allele snp definition.png|thumb|PGx alleles defined as collections of variants, with no requirement on loci that are not part of the definition, will assign the same PGx allele to several different haplotypes]]
 
[[File:Variant tree allele snp definition.png|thumb|PGx alleles defined as collections of variants, with no requirement on loci that are not part of the definition, will assign the same PGx allele to several different haplotypes]]
 
This definition only requires matches for variants explicitly included in PGx allele definitions.  
 
This definition only requires matches for variants explicitly included in PGx allele definitions.  
Line 15: Line 23:
 
*But the presence of additional variants will have no effect on reported PGx alleles
 
*But the presence of additional variants will have no effect on reported PGx alleles
  
==The PharmCAT method==
+
===The PharmCAT method===
 
This definition requires matches also for variants not explicitly included in PGx allele definitions.  
 
This definition requires matches also for variants not explicitly included in PGx allele definitions.  
 
[[File:Variant tree allele pharmcat definition.png|thumb|PGx alleles defined as complete haplotypes classifies the patient uniquely]]
 
[[File:Variant tree allele pharmcat definition.png|thumb|PGx alleles defined as complete haplotypes classifies the patient uniquely]]
Line 23: Line 31:
 
(Note that in practice PharmCAT lets the user decide which allele definitions to use in their [https://github.com/PharmGKB/PharmCAT/wiki/NamedAlleleMatcher-101 NamedAlleleMatcher])
 
(Note that in practice PharmCAT lets the user decide which allele definitions to use in their [https://github.com/PharmGKB/PharmCAT/wiki/NamedAlleleMatcher-101 NamedAlleleMatcher])
  
==Which definition should we stick to?==
+
===Which definition should we stick to?===
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-

Revision as of 15:00, 14 August 2018

How to obtain PGx allele definitions from literature

PGx alleles are collected and distributed through various channels

  • PGx alleles on JSON-LD format from the PharmGKB API
  • PGx alleles on Excel-style formats from the PGRN Translational Pharmacogenetics Project (TPP), published through the PharmGKB API
  • PGx alleles as VCF files from PharmVar

PGx allele definitions are given in either GRCh37 or GRCh38 reference coordinates. The process of changing from GRCh37 to GRCh38 for PharmGKB is only partially completed. We suspect that there are serious discrepancies between PharmVar and PharmGKB sources (which is strange as PharmGKB gets data form PharmVar).

How to define PGx alleles for next generation sequencing

PGx alleles are defined as collections of one or more SNPs, INDELs or structural variants. When a patient is sequenced by next generation sequencing (NGS) we may typically observe more variants than those which are included in any individual PGx allele definitions.

The 16 possible haplotypes for a four loci, decomposed variant calling

This means that

  • Patients may have a large, ambiguous number of matching PGx alleles
  • Patients may have additional variants that may modify the effect of a known PGx allele

We illustrate some of the problems that we encountered when trying to match patient haplotypes to the PGx allele definitions, by a four loci PGx gene

The SNP array method

PGx alleles defined as collections of variants, with no requirement on loci that are not part of the definition, will assign the same PGx allele to several different haplotypes

This definition only requires matches for variants explicitly included in PGx allele definitions.

This means that

  • Several PGx alleles may match the patient
  • But the presence of additional variants will have no effect on reported PGx alleles

The PharmCAT method

This definition requires matches also for variants not explicitly included in PGx allele definitions.

PGx alleles defined as complete haplotypes classifies the patient uniquely

This means that

  • Only one PGx allele can exist simultaneously for the same patient
  • But whenever we have additional variants, no PGx alleles will be reported

(Note that in practice PharmCAT lets the user decide which allele definitions to use in their NamedAlleleMatcher)

Which definition should we stick to?

Method Advantages Disadvantages
SNP array method Compatible with previous SNP array methods. Assigns PGx alleles to the maximum number of patients Multiple PGx alleles are possible
PharmCAT method One PGx allele per patient Less compatible with previous SNP array methods. Some patients are no longer assigned to a known PGx allele