[GET-dev] where do I get raw data ?

Madeleine Price Ball meprice at gmail.com
Wed Dec 22 21:47:32 EST 2010


Which four did you try? The title should have "with indel and
coverage". I just checked the source from this PGP10 link, it seems to
have indels and coverage:
http://evidence.personalgenomes.org/genomes.php?display_genome_id=d5aa2c5db8d0aef211ebd4318065179677ff33ba

Unlike the PGP10 pilot data, the PGP11 and PGP12 data donated to us
was produced using a different platform and hasn't been processed with
indel or coverage data. From a brief glance, getting coverage from
these looks like it will be a PITA, although I could be missing some
file that has it nicely formatted. It's possible that, for your
purposes, you'd be best off not trying to use this stuff anyway -- if
you're comparing genomes to each other I think it's best to stick to
data produced through the same platform.

I think there's a desire to put more "raw" data files available
somewhere, but I'm not really the best person to answer that question
(which is why I didn't answer Erik's question).

   - Madeleine

PS -  Sorry, I've misplaced the admin password and so couldn't approve
the email to the get-dev list, but I can reply to it. There are other
admins, though.

On Wed, Dec 22, 2010 at 5:27 PM, Leon Peshkin <peshkin at gmail.com> wrote:
> Hi Madeleine,
>     thanks a lot for explaining the format. I downloaded some GFF files,
> unfortunately
> out of four I tried, only PGP1 has "INDEL" I am mostly interested in in
> PGP10,11,12
> could you point me to the complete data for these three please ?
>    best regards
>
> -Leon
>
>
> On Wed, Dec 22, 2010 at 2:55 PM, Madeleine Price Ball <meprice at gmail.com>
> wrote:
>>
>> Sorry, it's been changing as we update the genome analysis methods.
>>
>> If you click on any of the PGP genomes that say "with indel and
>> coverage" it should have indel and coverage information in the
>> downloadable gff data. For example:
>>
>>
>> http://evidence.personalgenomes.org/genomes.php?display_genome_id=65711e3d6829f08c2f8aeeaf06b67b4d2c744e38
>>
>> If you click on "Source data: download GFF (115 MB)" you'll get a file
>> called "PGP1_\(George_Church\)_with_indel_and_coverage.gff" but ...
>> it's actually gzipped. Sorry. You'll probably want to do:
>>
>> mv PGP1_\(George_Church\)_with_indel_and_coverage.gff
>> PGP1_\(George_Church\)_with_indel_and_coverage.gff.gz
>>
>> FWIW - there's a fix for this bug here, maybe Tom can pull it to the main
>> site:
>>
>> https://github.com/madprime/get-evidence/commit/129e510318bd5381d86cd6ab1e9aca5976bd1c46
>>
>> I've made a write up for how indels and coverage are marked here:
>> http://evidence.personalgenomes.org/guide_upload_and_source_file_formats
>>
>> On Tue, Dec 21, 2010 at 8:43 AM, Leon Peshkin <peshkin at gmail.com> wrote:
>> > Hello!
>> >
>> > Could someone help me with a pointer to PGP-10 raw data files, that is
>> > more
>> > than list of SNPs.
>> > I am interested to get a pretty short (few thousand nucleotide) chunks
>> > to
>> > compare across individuals,
>> > but it might contain deletetions in some.
>> >  Sasha mentioned that data is available from
>> > http://evidence.personalgenomes.org/genomes
>> > but I do not see any mention of "coverage and indels" at the page.
>> > There is a link to http://evidence.personalgenomes.org/download
>> >  which is linked to the SQL dump and flat tsv file, but not BAM or SAM,
>> > so I
>> > am somewhat confused.
>> >
>> > -Leon
>> >
>> >
>> > _______________________________________________
>> > GET-dev mailing list
>> > GET-dev at lists.freelogy.org
>> > http://lists.freelogy.org/mailman/listinfo/get-dev
>> >
>> >
>
>




More information about the Arvados mailing list