[GET-dev] Mapping build 37 SNPs to build 36 genomes?

Kimberly Robasky krobasky at gmail.com
Fri Jun 18 14:07:09 EDT 2010


I'm trying to understand how GET is mapping build 37 SNPs to build 36
genomes, and to that end, here's a specific example:

How does GET know that NA19240 has variant rs77023418?:
http://evidence.personalgenomes.org/AGAP7-Thr362Asn
You see NA19240 at the bottom has no chr/coordinate, nor does this
page cross-reference the dbSNP id.

I  can't find the corresponding snp in the gff file either, but the
gff coordinates are build 36.3, right?  The AGAP7 gene has different
coordinates in build 37, and dbSNP doesn't have a 36.3 mapping for it,
so I'm guessing that means the SNP was reported after build 37 came
out.  So how does GET know NA19240 has this SNP?

Looking into it further via NCBI, I see that the exons in both builds
are the same, relative to the start of the gene, so perhaps the build
37 coordinate for this snp (51465371) is at the same offset in build
36.3 (51135378), but I don't find that one in the gff either.  So
where is GET finding it?

I've been slogging through source trying to figure out how genome id
gets mapped to variant in the edits/snap_latest/snap_release tables,
but to no avail.  It doesn't seem to have been mapped in the makefile
or install.php, either.  Could this be an artifact from some previous
source code base?

More broadly, I found this because I'm trying to map all variants to
coordinates that I use to compute conservation, but I have no
coordinates for around 15% of the variants, including this AGAP7
variant.

-Kim




More information about the Arvados mailing list