Hi Madeleine, <br>   thanks for getting back to me on this. My main issue remains - I do not know what it means exactly when I see that a given genome has "Allele frequency of 18%" of  rs4646 , and in turn of "WNT11  R666K". <br>

In case of dbSNP it is not clear at all which of possibilities is found in the genome in what fraction,<br>in case of GET-e encoding, it is clear that  the variant in question is "K" instead of "R" in the reference genome, but can I conclude that there are 18% of reads supporting "K" and the remaining 82% support "R" or am I completely lost ? <br>

So at least in dbSNP case, not clear which variant the individual is carrying of all possible variants and whether a given publication implies that this rs4646 implies that phenotype is more likely or respectively less likely... <br>

<br> -Leon <br><br>Finally, is there a way to include all the info you describe below (which strand, which nucleotide etc) on the gene page somewhere as "auxiliary" it would be very helpful  ? <br><br><div class="gmail_quote">

On Tue, Apr 10, 2012 at 5:35 PM, Madeleine Ball <span dir="ltr"><<a href="mailto:mpball@gmail.com" target="_blank">mpball@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


The dbSNP annotations are less specific than GET-Evidence: as I understand it, they're not even specifying a *genotype* -- just a *location*.<div><br></div><div>GET-Evidence does alias some dbSNP IDs to amino acid changes -- when it does this, it's assuming that the dbSNP ID refers to the nonreference amino acid change predicted.</div>


<div>For example: </div><div><a href="http://evidence.personalgenomes.org/rs429358" target="_blank">http://evidence.personalgenomes.org/rs429358</a></div><div>redirects to:</div><div><a href="http://evidence.personalgenomes.org/APOE-C130R" target="_blank">http://evidence.personalgenomes.org/APOE-C130R</a><br>


<br>I'm not sure what your question about zygosity is asking.</div><div><br></div><div>If you're looking at a SNP which is on a gene which runs in "reverse" on the genome, you'll often find inconsistency in literature based on the perspective taken when reporting the variant. For example, a variant is A or G on the reference genome, but from the perspective of the gene transcript, it's T or C (reverse complemented). You can see how this would be almost impossibly confusing if the variant happens to be C/G or A/T...</div>


<div><br></div><div>With amino acid changes it is a lot less ambiguous. Sometimes you'll find a change reported the other way in a paper (say, "APOE R130C" instead of "APOE C130R") -- in this case, the paper's attitude is that the "R" is "reference" and the "C" is variant. This can happen if they aren't interested in what the reference genome says about the issue (the paper has it's own opinion about what is "wildtype" -- because sometimes the reference genome is actually the more rare variant). </div>


<div><br></div><div>The last letter in an ID like "APOE-C130R" is the actual variant allele, the first letter is just a reference point describing the original "reference" allele. (Think of it as error checking, the "C" part of the variant isn't necessary, it's just an error check.) We've occasionally made GET-E entries for "reference" variants using the nomenclature "APOE-C130C" ... but I should note that we haven't been using these in automatic interpretation, the automatic system currently only examines differences from the reference genome.</div>


<div><br></div><div>Hope this helps.</div><div><br></div><font color="#888888"><div>-- Madeleine</div></font><div><div></div><div><div><br></div><div><div class="gmail_quote">On Tue, Apr 10, 2012 at 4:13 PM, Leon Peshkin <span dir="ltr"><<a href="mailto:peshkin@gmail.com" target="_blank">peshkin@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi everyone,<br>   after doing a whole bunch of curation at GET-e, I realized I am confused at a very fundamental level - which<br>


variant exactly is present in a given genome - I would appreciate some feedback and help. Let's say there is <br>

a variant <br>    Notch3   P34R  also known as rs1234<br>which corresponds to nucleotide C changing to G.  <br>  the definition of rs...... often (but not always) has all possible variants (C;C) (C;G) (G;G) like this one <br>


rs4646    <a href="http://www.snpedia.com/index.php/Rs4646" target="_blank">http://www.snpedia.com/index.php/Rs4646</a> <br>  and your genome has allele frequency of 31%.  <br>What exactly does it mean ? Where does zygosity annotation come<span style="color:rgb(34,34,34);font-family:Verdana,Arial,sans-serif;font-size:18px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(226,228,255);display:inline!important;float:none"></span> from and why is it missing in many variants?<br>


Literature sometimes talks about rs1234 sometimes specifies nucleotide, sometimes amino acid, plus sometimes there is confusion /inconsistency among papers which strand the variant is on (if non-coding variants). <br> <br>


  Is there a way to facilitate the annotation by specifying the variant in more than one way in GET-e ? <br><span><font color="#888888"><br>-Leon <br>

</font></span></blockquote></div><br></div>

</div></div></blockquote></div><br>