[GET-dev] Autoscores + Counsyl variants
Madeleine Price Ball
meprice at gmail.com
Fri May 21 08:53:08 EDT 2010
I've uploaded a new copy of the counsyl variants list, there were some
^M's in there, invisible to us when making the google spreadsheet.
http://mad.printf.net/counsyl_variants.csv
I guess LAMB3-R635X should have at least 5 points?
2 for nonsense mutation
2 for being in OMIM
1 for being a GeneTests testable gene
I don't know if we should worry about how "lumpy" the database data
is. Since much of the database is imported from OMIM we expect it to
have a lot of 2's -- the profile for an individual will look
different. Here's the list for PGP1:
http://mad.printf.net/PGP1_nsSNPs.csv
It's strange for the counsyl list to have any 0's, Sasha pointed out
yesterday that almost by definition these are genes that "have testing
available". Maybe the names aren't matching, or maybe Counsyl tests
them but they don't have per-gene testing available as listed on
GeneTests.
Should we worry about how "lumpy" the Counsyl list looks? Sasha--any
luck on getting an HCM list from Heidi?
- Madeleine
On Fri, May 21, 2010 at 12:41 AM, Tom Clegg
<tom at scalablecomputingexperts.com> wrote:
> Autoscores for all of the Counsyl variants are attached.
>
> There were a few lines that look like they were corrupted by some
> translation process (I ignored them):
>
> ",nsSNP8S
> ",nsSNP58Q
> ",nsSNP52W
>
> Distribution of autoscores for counsyl variants: (select
> autoscore,count(variant_id) from counsyl_autoscore group by autoscore)
>
> +-----------+-------------------+
> | autoscore | count(variant_id) |
> +-----------+-------------------+
> | 0 | 33 |
> | 1 | 4 |
> | 2 | 119 |
> | 3 | 4 |
> | 4 | 129 |
> +-----------+-------------------+
>
> Distribution of autoscores for all variants: (cut -f40 latest-flat.tsv |
> tail -n +2 | sort -n | uniq -c)
> 62304 0
> 3993 1
> 10473 2
> 1512 3
> 3378 4
> Presumably *some* variants should be getting scores >4 -- I'll have to look
> at this tomorrow (examples welcome).
> The "in genetests?" contribution to the above autoscores is based on whether
> the gene is *listed* in genetests, not whether its record indicates "test
> available"... contrary to what I told Madeleine today. I've fixed that just
> now, and the scores are being recalculated. (64 of the 836 genes in
> genetests are "no test available")
> Tom
>
> _______________________________________________
> GET-dev mailing list
> GET-dev at lists.freelogy.org
> http://lists.freelogy.org/mailman/listinfo/get-dev
>
>
More information about the Arvados
mailing list