[GET-dev] More speed, genomes for PGP12 are up

Madeleine Price Ball meprice at gmail.com
Thu Sep 30 21:24:58 EDT 2010


As with other steps in the genome processing, nonsynonymous calling was
insanely slow. I was worried that Xiaodi did the reference sequence
extraction in a bad way despite all the C code optimization, but it turns
out to just be another SQL-query-is-slow issue. A version which uses a
sorted version of refflat.txt runs through all the calls in 3 minutes, the
SQL querying version took 112 minutes -- a 37-fold improvement.

So after removing everything but the ref allele calling, dbSNP calling,
nonsynonymous calling, and matching against GET-Evidence, processing a
genome takes 40 minutes on my server. I think most of that is in the last
step now, which still uses SQL queries.

I've loaded up the first 12 PGP genomes onto:
http://mball.freelogy.org/genomes.php

You can check out what the new reports look like there. The tables are ugly
as heck 'cause I haven't learned CSS yet, but we might want to think about
some color coding. I know Joe wants to color code allele frequency. Cells
could also be colored according to impact... maybe more vivid for more
evidence and color reflecting clinical importance & impact (red = high
pathogenic, yellow=low pathogenic, green = pharm, blue = benign)...

  -- Madeleine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20100930/505e11f7/attachment.html>


More information about the Arvados mailing list