[GET-dev] pgp sequence format
Ruth McCole
rmccole at genetics.med.harvard.edu
Fri Jan 28 14:24:41 EST 2011
Hi,
I have been working with pgp sequences I downloaded in .gff format from
here: http://evidence.personalgenomes.org/genomes
I was wondering about why a representation of insertions was chosen
where the start coordinate is greater than the end. I understand this is
in order to distinguish insertions from deletions and SNPs, but the
violation of gff format has been pretty annoying. In order to analyze
the file in either galaxy or bedtools, I have changed it so that if
start>end, 'INDEL' is replaced by 'INS', and the start and end
coordinates are swopped.
Is there any reason why I shouldn't do this? Please don't ask for the
python script to do this because it has a bug in it which means its ok
for the sequence I'm primarily working on, but not necessarily any
other. I am trying to fix this.
Many thanks,
Ruth
--
Ruth McCole
Postdoctoral Researcher
Wu Lab
Department of Genetics
Harvard Medical School
77 Avenue Louis Pasteur
Boston, MA 02115
More information about the Arvados
mailing list