[GET-dev] Re: Getting to the bottom of which variant one is looking at ?

Tue Apr 10 17:35:29 EDT 2012

The dbSNP annotations are less specific than GET-Evidence: as I understand
it, they're not even specifying a *genotype* -- just a *location*.

GET-Evidence does alias some dbSNP IDs to amino acid changes -- when it
does this, it's assuming that the dbSNP ID refers to the nonreference amino
acid change predicted.
For example:
http://evidence.personalgenomes.org/rs429358
redirects to:
http://evidence.personalgenomes.org/APOE-C130R

I'm not sure what your question about zygosity is asking.

If you're looking at a SNP which is on a gene which runs in "reverse" on
the genome, you'll often find inconsistency in literature based on the
perspective taken when reporting the variant. For example, a variant is A
or G on the reference genome, but from the perspective of the gene
transcript, it's T or C (reverse complemented). You can see how this would
be almost impossibly confusing if the variant happens to be C/G or A/T...

With amino acid changes it is a lot less ambiguous. Sometimes you'll find a
change reported the other way in a paper (say, "APOE R130C" instead of
"APOE C130R") -- in this case, the paper's attitude is that the "R" is
"reference" and the "C" is variant. This can happen if they aren't
interested in what the reference genome says about the issue (the paper has
it's own opinion about what is "wildtype" -- because sometimes the
reference genome is actually the more rare variant).

The last letter in an ID like "APOE-C130R" is the actual variant allele,
the first letter is just a reference point describing the original
"reference" allele. (Think of it as error checking, the "C" part of the
variant isn't necessary, it's just an error check.) We've occasionally made
GET-E entries for "reference" variants using the nomenclature "APOE-C130C"
... but I should note that we haven't been using these in automatic
interpretation, the automatic system currently only examines differences
from the reference genome.

Hope this helps.

-- Madeleine

On Tue, Apr 10, 2012 at 4:13 PM, Leon Peshkin <peshkin at gmail.com> wrote:

> Hi everyone,
>    after doing a whole bunch of curation at GET-e, I realized I am
> confused at a very fundamental level - which
> variant exactly is present in a given genome - I would appreciate some
> feedback and help. Let's say there is
> a variant
>     Notch3   P34R  also known as rs1234
> which corresponds to nucleotide C changing to G.
>   the definition of rs...... often (but not always) has all possible
> variants (C;C) (C;G) (G;G) like this one
> rs4646    http://www.snpedia.com/index.php/Rs4646
>   and your genome has allele frequency of 31%.
> What exactly does it mean ? Where does zygosity annotation come from and
> why is it missing in many variants?
> Literature sometimes talks about rs1234 sometimes specifies nucleotide,
> sometimes amino acid, plus sometimes there is confusion /inconsistency
> among papers which strand the variant is on (if non-coding variants).
>
>   Is there a way to facilitate the annotation by specifying the variant in
> more than one way in GET-e ?
>
> -Leon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20120410/f8d897e5/attachment-0001.html>