[GET-dev] Autoscores + Counsyl variants

Kimberly Robasky krobasky at gmail.com
Fri May 21 09:53:58 EDT 2010


I think if your scoring values were more graded, your autoscore data
would be less "lumpy".

For example, I think its too conservative to require your BLOSUM
scores to be >3; So I would suggest looking breaking it down to 1
point for >= -3 and 2 points for > 3, and maybe even boosting your
other scores to compensate.  Here's why:

Look at mutations for Serine; mutating away from that S kills any
phosphorylation motif that might be there.  This will almost certainly
cause some kind of phenotype, but reading down the column of serine,
you won't find any >3's at all.  However, you find 16 that are >= -3.
That's more than any other residue, with Tyrosine coming in second
place, the other important phosphorylation residue, with 15 scores
that are >-3.

I think it's particularly worth emphasizing BLOSUM scores, given what
we learned from Shamil about how polyphen works, and that BLOSUM is
the only indicator we have for conservation (even if its only a rough
one).

-Kim

On Fri, May 21, 2010 at 9:02 AM, Abraham Rosenbaum <rosenbaum4 at gmail.com> wrote:
> To help in troubleshooting:
> CFTR Ser1255Stop should have 6 stars (stop codon, OMIM, GeneReviews).
> Some of the ACADM genes have 0 stars; this gene has a Genetests entry,
> is available for testing and is present in OMIM.
> According to the latest download we have >80,000 nsSNPs in our
> database (we should emphasize this point) but the variant_flat does
> not produce a list of splice variants or all synonymous entries. I
> think that it would be a good idea to get this data so that we can
> further de-emphasize our reliance on exons.
> -Abraham
>
> On Fri, May 21, 2010 at 8:53 AM, Madeleine Price Ball <meprice at gmail.com> wrote:
>> I've uploaded a new copy of the counsyl variants list, there were some
>> ^M's in there, invisible to us when making the google spreadsheet.
>> http://mad.printf.net/counsyl_variants.csv
>>
>> I guess LAMB3-R635X should have at least 5 points?
>> 2 for nonsense mutation
>> 2 for being in OMIM
>> 1 for being a GeneTests testable gene
>>
>> I don't know if we should worry about how "lumpy" the database data
>> is. Since much of the database is imported from OMIM we expect it to
>> have a lot of 2's -- the profile for an individual will look
>> different. Here's the list for PGP1:
>> http://mad.printf.net/PGP1_nsSNPs.csv
>>
>> It's strange for the counsyl list to have any 0's, Sasha pointed out
>> yesterday that almost by definition these are genes that "have testing
>> available". Maybe the names aren't matching, or maybe Counsyl tests
>> them but they don't have per-gene testing available as listed on
>> GeneTests.
>>
>> Should we worry about how "lumpy" the Counsyl list looks? Sasha--any
>> luck on getting an HCM list from Heidi?
>>
>>     - Madeleine
>>
>> On Fri, May 21, 2010 at 12:41 AM, Tom Clegg
>> <tom at scalablecomputingexperts.com> wrote:
>>> Autoscores for all of the Counsyl variants are attached.
>>>
>>> There were a few lines that look like they were corrupted by some
>>> translation process (I ignored them):
>>>
>>> ",nsSNP8S
>>> ",nsSNP58Q
>>> ",nsSNP52W
>>>
>>> Distribution of autoscores for counsyl variants:  (select
>>> autoscore,count(variant_id) from counsyl_autoscore group by autoscore)
>>>
>>> +-----------+-------------------+
>>> | autoscore | count(variant_id) |
>>> +-----------+-------------------+
>>> | 0         |                33 |
>>> | 1         |                 4 |
>>> | 2         |               119 |
>>> | 3         |                 4 |
>>> | 4         |               129 |
>>> +-----------+-------------------+
>>>
>>> Distribution of autoscores for all variants:  (cut -f40 latest-flat.tsv |
>>> tail -n +2 | sort -n | uniq -c)
>>>   62304 0
>>>    3993 1
>>>   10473 2
>>>    1512 3
>>>    3378 4
>>> Presumably *some* variants should be getting scores >4 -- I'll have to look
>>> at this tomorrow (examples welcome).
>>> The "in genetests?" contribution to the above autoscores is based on whether
>>> the gene is *listed* in genetests, not whether its record indicates "test
>>> available"... contrary to what I told Madeleine today.  I've fixed that just
>>> now, and the scores are being recalculated.  (64 of the 836 genes in
>>> genetests are "no test available")
>>> Tom
>>>
>>> _______________________________________________
>>> GET-dev mailing list
>>> GET-dev at lists.freelogy.org
>>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>>
>>>
>>
>> _______________________________________________
>> GET-dev mailing list
>> GET-dev at lists.freelogy.org
>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>
>
> _______________________________________________
> GET-dev mailing list
> GET-dev at lists.freelogy.org
> http://lists.freelogy.org/mailman/listinfo/get-dev
>




More information about the Arvados mailing list