[GET-dev] autoscore take II

Fri May 21 10:52:09 EDT 2010

Just suggestions...:

I've suggested refining the computational autoscoring points, and I
personally think +2 just because it's in OMIM (which we assert in our
paper has many false positives) is too high, so I'd back that down to
+1.  What do you guys think of the of following proposed changes to
the autoscoring algorithm?

Autoscoring is split into three sections: computational,
variant-specific, and gene-specific, each 2 points for a max of 6
points.

Computational (max of 2 points):
If an non-synonymous substitution:
+0.5 for NBLOSUM >= -3,
+1 for NBLOSUM >= 3,
another +.5 (+1.5 total) if NBLOSUM = 10 (a nonsense mutation)
and another +.5 (+2.0 total) if it causes a stop codon

For all variants:
+1 if within 1 base of a splice site
(Trait-o-matic has splicing data, presumably it could spot these?)
Indels:
+1 for indel in coding region
another +1 (+2 total) if causes a frameshift (not a multiple of 3 in
coding region)

Variant-specific lists (max of 2 points):
+1 if in OMIM
+1 if in PharmGKB
+1 if in HuGENet
another +1 (+2 total) if HuGENet's OR >= 1.5

Gene-specific lists (max of 2 points):
+1 if in GeneTests
another +1 (+2 total) if in a GeneReviews gene

--Kim
PS -- My hope is to take our best guess at "most likely relevant"
variants and look for any phosphorylation motifs that break: I can let
you know if I find anything of additional value.  Currently, the
unfiltered OMIM data are just too noisy to be of use at this point, as
was also intimated by Shamil.