[GET-dev] PGP info
Madeleine Ball
mpball at gmail.com
Mon May 9 16:50:46 EDT 2011
> 1. I have not received a GET-dev digest since March, have I missed some?
No, traffic has just been low lately. I admit I neglect posting
updates, I'll try to post more since I know that someone is paying
attention.
> 2. What is the difference between PGP 1-10 (with indel and coverage) and the
> other three? Are there scripts within get-evidence that check through
> different complete genomics files to compile this 'with indels and coverage'
> file?
The other three were from Illumina data (or maybe Knome for 11 & 12?)
donated a while ago that lacked both indel data and coverage.
The with-indel-and-coverage is just using the CGI var file. In this
case we're using the word "coverage" to mean "confidently called as
matching reference" (rather than "read depth"). There's a translator
that's automatically run on an uploaded CGI var file:
https://github.com/madprime/get-evidence/blob/master/server/conversion/cgivar_to_gff.py
You likely already know about this, but there's also a bunch of
genomes available from Complete Genomic's website:
http://www.completegenomics.com/sequence-data/download-data/
> 3. Is CNV and SD evidence available for the PGP genomes? It seems to me that
> only 'short' indels (100bp or so) are available, from the look of the
> entries in the right-most column of these .gff files.
You're right, data for that other stuff is in different files, which
weren't uploaded to GET-Evidence, at the moment you only have the
source file that GET-Evidence used. I think Sasha has been meaning to
get the full data onto an FTP somewhere. We've also been trying to get
a redone build 37 version for the ten genomes (you might want to wait
for that improved data).
- Madeleine
More information about the Arvados
mailing list