[GET-Editors] Editing questions

Madeleine Price Ball meprice at fas.harvard.edu
Wed Oct 13 16:27:17 EDT 2010


On Wed, Oct 13, 2010 at 4:10 PM, Kimberly Robasky <krobasky at gmail.com> wrote:
>> > On the same example, how do I calculate the case/controls
>> > 'significance'?
>> We've been using a two-tailed Fisher's Exact test. You can find some
>> calculators online, for example:
>> http://www.graphpad.com/quickcalcs/contingency1.cfm
>> Each box in this is case+, case-, control+, control-, accourding to
>> your data (28, 338, 50, 2320) this is < 0.0001.
>>
>> To get odds ratio (OR): (28 / 338) / (50 / 2320) = 3.84
>>
>> That puts SLC45A2-E272K at four stars based on the data from Graf et
>> al you recorded, according to the current criteria.
>
> Is there anything to keep us from building that calculation into
> GET-Evidence?  It seems to me that by querying the user to input it by hand
> only invites human error.  Would it make sense to supercede human entries
> for this number later, when we implement the significance and OR
> calculations?

Yes, I think it should get built in. The equation for it is pretty
straightforward:
http://en.wikipedia.org/wiki/Fisher%27s_exact_test

We should be able to implement an automatic calculation alongside the
automatic OR calculation pretty readily. (Aside: Currently the OR
calculation introduces a false "1" in place of "0" to avoid
divide-by-zero errors, I think it should be fixed to either report
"infinite" or use some numbers that produce the same significance
score -- in explanation, do 1 in place of 0 but then also increment
the other pool until Fisher's Exact gives the same significance score,
calculate OR from that.)

In this specific example it's complicated by a more general problem
with recording case/control numbers: right now you are only offered to
record this data in the context of diseases (which are automatically
offered based on GeneTests) -- you could fill it in, but the disease
offered is albinism, obviously incorrect. Should we dissociate
case/control numbers from disease information? Offer the option of
"other"? Get better phenotype labeling (a long-standing issue)? I'm
thinking "other" might be a useful patch for now.

>>
>> >  How do I calculate familial LOD score?
>>
>> This is complicated -- first you'll need a pedigree. I mean to write
>> up a guide on it, although I'm still not totally confident that I
>> understand it correctly. Sometimes a paper mentions LOD, in which case
>> you can use that. If there aren't pedigrees (for example, everything
>> is case/control data) then familial evidence is zero stars.
>
> Again - wonder how much can be automated.

It would be cool if we could automate drawing pedigrees and use this
to create formatted versions we can store associated with papers, LOD
calculation could be implemented alongside it. I'm not aware of a way
to do this though. Maybe drawing isn't the best way to input
pedigrees? How do physicians do it?

  -- Madeleine




More information about the Arvados mailing list