[GET-dev] Autoscores + Counsyl variants

Kimberly Robasky krobasky at gmail.com
Fri May 21 10:34:08 EDT 2010


I'm actually working on that with Mike and am having a very hard time
finding anything significant to correlate variants with
phosphorylation motifs.  I'm convinced its there, but we're only
looking at OMIM data, and I believe there are so many false positives
that I can't cut through the noise.  Hence, this paper's assertion
that there are a lot of false positives is very timely.  This
additionally motivates me to further improve the autoscore algorithm,
and to that end, I think if you make it more fine-grained, your output
data would be less lumpy.


On Fri, May 21, 2010 at 10:27 AM, Abraham Rosenbaum
<rosenbaum4 at gmail.com> wrote:
> We designate NBLOSUM=10 for stop codons, so I think we have that built
> in already. In terms of a more graded system, Mike Chou has expressed
> an interest in a grading system based upon predicted
> protein-modification motif changes -- we can turn this into a real
> effort and say we are working on it.
> -Abraham
>
> On Fri, May 21, 2010 at 10:22 AM, Kimberly Robasky <krobasky at gmail.com> wrote:
>> Another suggestion - add an autoscore point to anything that creates a
>> stop codon in a coding region?
>>
>> On Fri, May 21, 2010 at 9:53 AM, Kimberly Robasky <krobasky at gmail.com> wrote:
>>> I think if your scoring values were more graded, your autoscore data
>>> would be less "lumpy".
>>>
>>> For example, I think its too conservative to require your BLOSUM
>>> scores to be >3; So I would suggest looking breaking it down to 1
>>> point for >= -3 and 2 points for > 3, and maybe even boosting your
>>> other scores to compensate.  Here's why:
>>>
>>> Look at mutations for Serine; mutating away from that S kills any
>>> phosphorylation motif that might be there.  This will almost certainly
>>> cause some kind of phenotype, but reading down the column of serine,
>>> you won't find any >3's at all.  However, you find 16 that are >= -3.
>>> That's more than any other residue, with Tyrosine coming in second
>>> place, the other important phosphorylation residue, with 15 scores
>>> that are >-3.
>>>
>>> I think it's particularly worth emphasizing BLOSUM scores, given what
>>> we learned from Shamil about how polyphen works, and that BLOSUM is
>>> the only indicator we have for conservation (even if its only a rough
>>> one).
>>>
>>> -Kim
>>>
>>> On Fri, May 21, 2010 at 9:02 AM, Abraham Rosenbaum <rosenbaum4 at gmail.com> wrote:
>>>> To help in troubleshooting:
>>>> CFTR Ser1255Stop should have 6 stars (stop codon, OMIM, GeneReviews).
>>>> Some of the ACADM genes have 0 stars; this gene has a Genetests entry,
>>>> is available for testing and is present in OMIM.
>>>> According to the latest download we have >80,000 nsSNPs in our
>>>> database (we should emphasize this point) but the variant_flat does
>>>> not produce a list of splice variants or all synonymous entries. I
>>>> think that it would be a good idea to get this data so that we can
>>>> further de-emphasize our reliance on exons.
>>>> -Abraham
>>>>
>>>> On Fri, May 21, 2010 at 8:53 AM, Madeleine Price Ball <meprice at gmail.com> wrote:
>>>>> I've uploaded a new copy of the counsyl variants list, there were some
>>>>> ^M's in there, invisible to us when making the google spreadsheet.
>>>>> http://mad.printf.net/counsyl_variants.csv
>>>>>
>>>>> I guess LAMB3-R635X should have at least 5 points?
>>>>> 2 for nonsense mutation
>>>>> 2 for being in OMIM
>>>>> 1 for being a GeneTests testable gene
>>>>>
>>>>> I don't know if we should worry about how "lumpy" the database data
>>>>> is. Since much of the database is imported from OMIM we expect it to
>>>>> have a lot of 2's -- the profile for an individual will look
>>>>> different. Here's the list for PGP1:
>>>>> http://mad.printf.net/PGP1_nsSNPs.csv
>>>>>
>>>>> It's strange for the counsyl list to have any 0's, Sasha pointed out
>>>>> yesterday that almost by definition these are genes that "have testing
>>>>> available". Maybe the names aren't matching, or maybe Counsyl tests
>>>>> them but they don't have per-gene testing available as listed on
>>>>> GeneTests.
>>>>>
>>>>> Should we worry about how "lumpy" the Counsyl list looks? Sasha--any
>>>>> luck on getting an HCM list from Heidi?
>>>>>
>>>>>     - Madeleine
>>>>>
>>>>> On Fri, May 21, 2010 at 12:41 AM, Tom Clegg
>>>>> <tom at scalablecomputingexperts.com> wrote:
>>>>>> Autoscores for all of the Counsyl variants are attached.
>>>>>>
>>>>>> There were a few lines that look like they were corrupted by some
>>>>>> translation process (I ignored them):
>>>>>>
>>>>>> ",nsSNP8S
>>>>>> ",nsSNP58Q
>>>>>> ",nsSNP52W
>>>>>>
>>>>>> Distribution of autoscores for counsyl variants:  (select
>>>>>> autoscore,count(variant_id) from counsyl_autoscore group by autoscore)
>>>>>>
>>>>>> +-----------+-------------------+
>>>>>> | autoscore | count(variant_id) |
>>>>>> +-----------+-------------------+
>>>>>> | 0         |                33 |
>>>>>> | 1         |                 4 |
>>>>>> | 2         |               119 |
>>>>>> | 3         |                 4 |
>>>>>> | 4         |               129 |
>>>>>> +-----------+-------------------+
>>>>>>
>>>>>> Distribution of autoscores for all variants:  (cut -f40 latest-flat.tsv |
>>>>>> tail -n +2 | sort -n | uniq -c)
>>>>>>   62304 0
>>>>>>    3993 1
>>>>>>   10473 2
>>>>>>    1512 3
>>>>>>    3378 4
>>>>>> Presumably *some* variants should be getting scores >4 -- I'll have to look
>>>>>> at this tomorrow (examples welcome).
>>>>>> The "in genetests?" contribution to the above autoscores is based on whether
>>>>>> the gene is *listed* in genetests, not whether its record indicates "test
>>>>>> available"... contrary to what I told Madeleine today.  I've fixed that just
>>>>>> now, and the scores are being recalculated.  (64 of the 836 genes in
>>>>>> genetests are "no test available")
>>>>>> Tom
>>>>>>
>>>>>> _______________________________________________
>>>>>> GET-dev mailing list
>>>>>> GET-dev at lists.freelogy.org
>>>>>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> GET-dev mailing list
>>>>> GET-dev at lists.freelogy.org
>>>>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> GET-dev mailing list
>>>> GET-dev at lists.freelogy.org
>>>> http://lists.freelogy.org/mailman/listinfo/get-dev
>>>>
>>>
>>
>




More information about the Arvados mailing list