[GET-dev] To-do list

Madeleine Price Ball meprice at gmail.com
Mon Jun 7 14:16:34 EDT 2010


I'd like to start a discussion of what we need to do next. While a lot
of these are already in tickets
(https://trac.scalablecomputingexperts.com/wiki/GET-Evidence) we need
to prioritize. Of all the things below that's really bothering me at
this particular moment, I'm worried about "Duplicate or related
entries for the same variant".

Things we say we have in the paper:
(1) Updated BLOSUM matrix
(2) Merged Trait-o-matic and GET-Evidence

Things that are starting to drive me nuts as an editor:
(1) Duplicate or related entries for the same variant:
  * alternate splicing (these are duplicates for one genome position)
  * trimmed peptides make for different OMIM variant position
  * reference/variant swapped in OMIM
  * Solve with redirects?
  * Should there be a shared ID for all splice variants?
  * I'm getting increasingly concerned about the database being
    filled with information on redundant pages.
(2) Ghost data:
  * Old genome processing shows up on GET-Evidence
    (currently intended)
  * Multiple genomes from same individual NOT showing up
    (not the intended behavior)
  * Broken links to snp.med
  * Assuming we do want the multiple sets, we should be told
     they exist (e.g. "1 of 3 sets associated with this individual")
  * In general the whole "associating multiple sets with an
    individual" needs to be cleaned up, until then it may be best
    to simply remove "ghost data".

Wish list:
  * Process John West and Jason Flatley genomes
  * Autoscore at splice sites
  * Why is trait-o-matic so slow at getting amino acid changes?
  * Indels

Broken things:
  * MYBPC3-Q850X appears to be a broken/frameshift amino
    acid call, if you look at the UCSC sequence there is no "Q"
    at this position.
    https://trac.scalablecomputingexperts.com/ticket/43
  * Lots of other bugs reported




More information about the Arvados mailing list