[GET-dev] Re: Latest draft

krobasky at gmail.com krobasky at gmail.com
Mon May 31 17:42:19 EDT 2010

Our Blosum100 matrix may merely be out of date since these matrices  
are computed from alignment databases (unlike PAM). I would guess the  
bioperl guys have updated accordingly?

The real question is, which matrix should we use. I vote for the NCBI  
matrix used by blast, as it is recent, seems to have community  
consensus and we have a literature reference for it.

Sent from my iPhone

On May 31, 2010, at 4:39 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:

> Bug 39 has been addressed; I've committed the change to my own
> production branch at GitHub (the only one that doesn't have merge
> issues given how much has changed since I last edited) and sent a pull
> request to Tom, so he should be able to work that into his branch now.
> Might be useful to share this info with the BioPython team, since the
> original matrices were derived from their source, and it stands to
> reason that they've been using the wrong matrix values for BLOSUM100
> all along and still are.
> Re: bug 22, this is a problem beyond our control. dbSNP (via the link
> you sent me) agrees with Trait-o-matic and labels rs3798220 as I1891M.
> This is based on NP_005568.2. As noted in that file,
> "Depending on the individual, the encoded protein contains 2-43 copies
> of kringle-type domains. The allele represented here contains 15
> copies of the kringle-type repeats and corresponds to that found in
> the reference genome sequence. [provided by RefSeq]. Sequence Note:
> This gene is highly polymorphic in length and number of exons due to
> variation in the number of kringle IV-2 repeats which vary from 2-43
> copies among individuals. This RefSeq record was created from the
> reference genome assembly based on the exon representation found in
> DQ452068.1 whose sequence is consistent with the reference genome
> sequence, and includes 15 copies of the kringle IV-2 repeats."
> So the traditional designation of I4399M is based on a sequence with a
> different number of kringle repeats, which is no longer used. There's
> nothing we can do about this except to note that literature previous
> to 2006 (apparently, that's when this record superseded the previous
> one; I'm not sure how many kringles are in that one) I4399M refers to
> the current designation of I1891M.
> On Sun, May 30, 2010 at 1:09 PM, Madeleine Price Ball
> <meprice at fas.harvard.edu> wrote:
>> On Sun, May 30, 2010 at 1:28 PM, Xiaodi Wu <xiaodi.wu at gmail.com>  
>> wrote:
>>> Hi Madeleine,
>>> I can commit the change for bug 39 (BLOSUM100) if you update me on
>>> what the new workflow is in terms of the git repository that's the
>>> current master, etc. It takes only a few minutes and I may as well  
>>> do
>>> it, since I've worked on the original.
>> I believe this one is the master:
>> http://github.com/tomclegg/trait-o-matic/
>>> Re: your comment in bug 22
>>> (amino acid positions), refFlat does have splice variants. Your best
>>> bet as to figuring out where the problem is is by double checking to
>>> see if the SNP has been re-mapped. With every version of dbSNP (and,
>>> obviously, every major release of the reference genome, but that's  
>>> not
>>> what we're interested in here), some of the SNP rs number--genome
>>> position correlations are changed to reflect better data. This might
>>> be one of those cases. Otherwise, it's hard to believe how amino  
>>> acid
>>> 1891 and amino acid 4399 could be confused; the alternative is that
>>> they are not and really do point back to the same genome position,  
>>> in
>>> which case Trait-o-matic is designed to detect and look up both.
>> Well, there aren't any splice variants for this produced by
>> trait-o-matic, we checked that. When I look up the SNP on dbSNP:
>> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs3798220
>> I don't know how dbSNP would cause the problem, but I don't think
>> that's the issue. I find the same position for 36.3 reference genome
>> build as in the P0 trait-o-matic output, chr6 160881127.
>> Maybe someone needs to sit down with the refFlat file and figure out
>> whether it matches UCSC annotation and whether those match the 4399
>> position published.
> _______________________________________________
> GET-dev mailing list
> GET-dev at lists.freelogy.org
> http://lists.freelogy.org/mailman/listinfo/get-dev

More information about the Arvados mailing list