[GET-dev] Autoscores + Counsyl variants

Kimberly Robasky krobasky at gmail.com
Fri May 21 10:22:06 EDT 2010


Another suggestion - add an autoscore point to anything that creates a
stop codon in a coding region?

On Fri, May 21, 2010 at 9:53 AM, Kimberly Robasky <krobasky at gmail.com> wrote:
> I think if your scoring values were more graded, your autoscore data
> would be less "lumpy".
>
> For example, I think its too conservative to require your BLOSUM
> scores to be >3; So I would suggest looking breaking it down to 1
> point for >= -3 and 2 points for > 3, and maybe even boosting your
> other scores to compensate.  Here's why:
>
> Look at mutations for Serine; mutating away from that S kills any
> phosphorylation motif that might be there.  This will almost certainly
> cause some kind of phenotype, but reading down the column of serine,
> you won't find any >3's at all.  However, you find 16 that are >= -3.
> That's more than any other residue, with Tyrosine coming in second
> place, the other important phosphorylation residue, with 15 scores
> that are >-3.
>
> I think it's particularly worth emphasizing BLOSUM scores, given what
> we learned from Shamil about how polyphen works, and that BLOSUM is
> the only indicator we have for conservation (even if its only a rough
> one).
>
> -Kim
>
> On Fri, May 21, 2010 at 9:02 AM, Abraham Rosenbaum <rosenbaum4 at gmail.com> wrote:
>> To help in troubleshooting:
>> CFTR Ser1255Stop should have 6 stars (stop codon, OMIM, GeneReviews).
>> Some of the ACADM genes have 0 stars; this gene has a Genetests entry,
>> is available for testing and is present in OMIM.
>> According to the latest download we have >80,000 nsSNPs in our
>> database (we should emphasize this point) but the variant_flat does
>> not produce a list of splice variants or all synonymous entries. I
>> think that it would be a good idea to get this data so that we can
>> further de-emphasize our reliance on exons.
>> -Abraham
>>
>> On Fri, May 21, 2010 at 8:53 AM, Madeleine Price Ball <meprice at gmail.com> wrote:
>>> I've uploaded a new copy of the counsyl variants list, there were some
>>> ^M's in there, invisible to us when making the google spreadsheet.
>>> http://mad.printf.net/counsyl_variants.csv
>>>
>>> I guess LAMB3-R635X should have at least 5 points?
>>> 2 for nonsense mutation
>>> 2 for being in OMIM
>>> 1 for being a GeneTests testable gene
>>>
>>> I don't know if we should worry about how "lumpy" the database data
>>> is. Since much of the database is imported from OMIM we expect it to
>>> have a lot of 2's -- the profile for an individual will look
>>> different. Here's the list for PGP1:
>>> http://mad.printf.net/PGP1_nsSNPs.csv
>>>
>>> It's strange for the counsyl list to have any 0's, Sasha pointed out
>>> yesterday that almost by definition these are genes that "have testing
>>> available". Maybe the names aren't matching, or maybe Counsyl tests
>>> them but they don't have per-gene testing available as listed on
>>> GeneTests.
>>>
>>> Should we worry about how "lumpy" the Counsyl list looks? Sasha--any
>>> luck on getting an HCM list from Heidi?
>>>
>>>     - Madeleine
>>>
>>> On Fri, May 21, 2010 at 12:41 AM, Tom Clegg
>>> <tom at scalablecomputingexperts.com> wrote:
>>>> Autoscores for all of the Counsyl variants are attached.
>>>>
>>>> There were a few lines that look like they were corrupted by some
>>>> translation process (I ignored them):
>>>>
>>>> ",nsSNP8S
>>>> ",nsSNP58Q
>>>> ",nsSNP52W
>>>>
>>>> Distribution of autoscores for counsyl variants:  (select
>>>> autoscore,count(variant_id) from counsyl_autoscore group by autoscore)
>>>>
>>>> +-----------+-------------------+
>>>> | autoscore | count(variant_id) |
>>>> +-----------+-------------------+
>>>> | 0         |                33 |
>>>> | 1         |                 4 |
>>>> | 2         |               119 |
>>>> | 3         |                 4 |
>>>> | 4         |               129 |
>>>> +-----------+-------------------+
>>>>
>>>> Distribution of autoscores for all variants:  (cut -f40 latest-flat.tsv |
>>>> tail -n +2 | sort -n | uniq -c)
>>>>   62304 0
>>>>    3993 1
>>>>   10473 2
>>>>    1512 3
>>>>    3378 4
>>>> Presumably *some* variants should be getting scores >4 -- I'll have to look
>>>> at this tomorrow (examples welcome).
>>>> The "in genetests?" contribution to the above autoscores is based on whether
>>>> the gene is *listed* in genetests, not whether its record indicates "test
>>>> available"... contrary to what I told Madeleine today.  I've fixed that just
>>>> now, and the scores are being recalculated.  (64 of the 836 genes in
>>>> genetests are "no test available")
>>>> Tom
>>>>
>>>> _______________________________________________
>>>> GET-dev mailing list
>>>> GET-dev at lists.freelogy.org
>>>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> GET-dev mailing list
>>> GET-dev at lists.freelogy.org
>>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>>
>>
>> _______________________________________________
>> GET-dev mailing list
>> GET-dev at lists.freelogy.org
>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>
>




More information about the Arvados mailing list