[GET-dev] PGP info

Madeleine Ball mpball at gmail.com
Mon May 9 16:50:46 EDT 2011


> 1. I have not received a GET-dev digest since March, have I missed some?

No, traffic has just been low lately. I admit I neglect posting
updates, I'll try to post more since I know that someone is paying
attention.

> 2. What is the difference between PGP 1-10 (with indel and coverage) and the
> other three? Are there scripts within get-evidence that check through
> different complete genomics files to compile this 'with indels and coverage'
> file?

The other three were from Illumina data (or maybe Knome for 11 & 12?)
donated a while ago that lacked both indel data and coverage.

The with-indel-and-coverage is just using the CGI var file. In this
case we're using the word "coverage" to mean "confidently called as
matching reference" (rather than "read depth"). There's a translator
that's automatically run on an uploaded CGI var file:
https://github.com/madprime/get-evidence/blob/master/server/conversion/cgivar_to_gff.py

You likely already know about this, but there's also a bunch of
genomes available from Complete Genomic's website:
http://www.completegenomics.com/sequence-data/download-data/

> 3. Is CNV and SD evidence available for the PGP genomes? It seems to me that
> only 'short' indels (100bp or so) are available, from the look of the
> entries in the right-most column of these .gff files.

You're right, data for that other stuff is in different files, which
weren't uploaded to GET-Evidence, at the moment you only have the
source file that GET-Evidence used. I think Sasha has been meaning to
get the full data onto an FTP somewhere. We've also been trying to get
a redone build 37 version for the ten genomes (you might want to wait
for that improved data).

- Madeleine




More information about the Arvados mailing list