[GET-dev] GFF3 to Wig ?

Erik Garrison erik.garrison at gmail.com
Fri Jan 7 08:22:40 EST 2011


On Thu, Jan 6, 2011 at 5:31 PM, Alexander Wait Zaranek <
awaitz at post.harvard.edu> wrote:

> On Tue, Jan 4, 2011 at 7:10 PM, Leon Peshkin <peshkin at gmail.com> wrote:
> >   Could someone point me to a tool which would allow
> > to create a SAM/BAM file from GFF3 and then export in Wig/BED files.
> > I thing BAM->BED is possible in bamtools, but how about
> > GFF3->BAM and BAM->Wig ?
> >  I am attaching a chunk of my gff3 file - it is alignment of reads
> > against scaffolds.
> >
> since the PGP gets whole genome and other genomic data from multiple
> platforms we convert this data into a simpler format we created for
> the PGP.   Most of our data was produced by Complete Genomics and is
> not in "BAM" format.   BAM doesn't have a way to faithfully represent
> CGI reads due to format limitations.  (The CGI->BAM converters are
> lossy, AFAIK.)
>

We've been working on format extensions to BAM which will make it applicable
to multiple splits per read fragment.

We didn't consider CGI data, which I believe is a bit different in structure
than the patterns we anticipated.  We could add support for it to the
extension.

The latest version of the CGI pipeline, with sample files, is described
> here:
> * http://www.completegenomics.com/sequence-data/download-data/


Does CGI only provide evidence for called alleles, or is this merely done to
make the example data smaller?


> Visualizing CGI/Illumina/SOLiD data is definitely an important
> project.  It would be great if a community member could volunteer to
> take it on!   Perhaps we could use this mailing list to work out a
> "spec" for what needs to be done?
>
> Thanks,
> Sasha
>
> >
> > scaffold38913   alignAssembly-sacc454pasa_LV15  cDNA_match      51483
> 51536   96      +       .       ID=chain_1;Target=GGWHCZS01CVNEB
> > 1 52 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      181791
>  182015  100     +       .       ID=chain_2;Target=GGWHCZS01B8DBA
> > 8 232 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      182349
>  182461  98      +       .       ID=chain_2;Target=GGWHCZS01B8DBA
> > 233 345 +
> > scaffold36953   alignAssembly-sacc454pasa_LV15  cDNA_match      12449
> 12685   97      -       .       ID=chain_3;Target=GGWHCZS01BYL46
> > 1 241 +
> > scaffold38915   alignAssembly-sacc454pasa_LV15  cDNA_match      143091
>  143273  100     +       .       ID=chain_4;Target=GGWHCZS01B68J0
> > 1 183 +
> > scaffold38408   alignAssembly-sacc454pasa_LV15  cDNA_match      17300
> 17392   96      -       .       ID=chain_5;Target=GGWHCZS01D8PA1
> > 1 92 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      153548
>  153684  99      -       .       ID=chain_6;Target=GGWHCZS01BUD6C
> > 8 143 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909581
>  909657  100     +       .       ID=chain_7;Target=GGWHCZS01B8A85
> > 1 77 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      154234
>  154314  95      -       .       ID=chain_8;Target=GGWHCZS01EMFO0
> > 1 83 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      153545
>  153735  98      -       .       ID=chain_8;Target=GGWHCZS01EMFO0
> > 84 275 +
> > scaffold36409   alignAssembly-sacc454pasa_LV15  cDNA_match      261048
>  261102  100     +       .       ID=chain_9;Target=GGWHCZS01EUQ0J
> > 1 55 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909591
>  909657  100     +       .       ID=chain_10;Target=GGWHCZS01DNMTI
> > 1 67 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909602
>  909657  100     +       .       ID=chain_11;Target=GGWHCZS01CRCLW
> > 1 56 +
> > scaffold38911   alignAssembly-sacc454pasa_LV15  cDNA_match      143736
>  143833  98      -       .       ID=chain_12;Target=GGWHCZS01D1K2W
> > 1 98 +
> > scaffold37916   alignAssembly-sacc454pasa_LV15  cDNA_match      81940
> 81992   100     +       .       ID=chain_13;Target=GGWHCZS01C6JD5
> > 1 53 +
> > scaffold37916   alignAssembly-sacc454pasa_LV15  cDNA_match      82836
> 82926   100     +       .       ID=chain_13;Target=GGWHCZS01C6JD5
> > 54 144 +
> > scaffold38933   alignAssembly-sacc454pasa_LV15  cDNA_match      135829
>  135927  98      -       .       ID=chain_14;Target=GGWHCZS01CQ9ZX
> > 1 99 +
> > scaffold38933   alignAssembly-sacc454pasa_LV15  cDNA_match      134108
>  134179  95      -       .       ID=chain_14;Target=GGWHCZS01CQ9ZX
> > 100 171 +
> > scaffold36910   alignAssembly-sacc454pasa_LV15  cDNA_match      94607
> 94679   97      -       .       ID=chain_15;Target=GGWHCZS01CFNTH
> > 1 71 +
> > scaffold36910   alignAssembly-sacc454pasa_LV15  cDNA_match      92066
> 92099   100     -       .       ID=chain_15;Target=GGWHCZS01CFNTH
> > 72 105 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      154234
>  154314  97      -       .       ID=chain_16;Target=GGWHCZS01DS3UM
> > 1 81 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      153545
>  153735  100     -       .       ID=chain_16;Target=GGWHCZS01DS3UM
> > 82 272 +
> > scaffold36953   alignAssembly-sacc454pasa_LV15  cDNA_match      12449
> 12683   97      -       .       ID=chain_17;Target=GGWHCZS01CTGNU
> > 1 239 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909592
>  909664  100     +       .       ID=chain_18;Target=GGWHCZS01CBNFR
> > 1 73 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      154234
>  154314  98      -       .       ID=chain_19;Target=GGWHCZS01CR372
> > 1 81 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      153545
>  153735  100     -       .       ID=chain_19;Target=GGWHCZS01CR372
> > 82 272 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909588
>  909667  97      +       .       ID=chain_20;Target=GGWHCZS01DOL8E
> > 1 80 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      182394
>  182451  98      +       .       ID=chain_21;Target=GGWHCZS01E37KQ
> > 1 58 +
> > scaffold36407   alignAssembly-sacc454pasa_LV15  cDNA_match      418186
>  418221  100     +       .       ID=chain_22;Target=GGWHCZS01A7VUK
> > 1 36 +
> > scaffold36407   alignAssembly-sacc454pasa_LV15  cDNA_match      419038
>  419166  100     +       .       ID=chain_22;Target=GGWHCZS01A7VUK
> > 37 165 +
> > scaffold36407   alignAssembly-sacc454pasa_LV15  cDNA_match      419794
>  419868  100     +       .       ID=chain_22;Target=GGWHCZS01A7VUK
> > 166 240 +
> > scaffold36421   alignAssembly-sacc454pasa_LV15  cDNA_match      118515
>  118699  99      -       .       ID=chain_23;Target=GGWHCZS01CJEXS
> > 1 184 +
> > scaffold37410   alignAssembly-sacc454pasa_LV15  cDNA_match      361706
>  361809  100     -       .       ID=chain_24;Target=GGWHCZS01BB26P
> > 1 104 +
> > scaffold37410   alignAssembly-sacc454pasa_LV15  cDNA_match      360904
>  360952  100     -       .       ID=chain_24;Target=GGWHCZS01BB26P
> > 105 153 +
> > scaffold37910   alignAssembly-sacc454pasa_LV15  cDNA_match      82980
> 83092   100     +       .       ID=chain_25;Target=GGWHCZS01DHIW0
> > 1 113 -
> > scaffold37910   alignAssembly-sacc454pasa_LV15  cDNA_match      83537
> 83584   94      +       .       ID=chain_25;Target=GGWHCZS01DHIW0
> > 114 163 -
> > scaffold36430   alignAssembly-sacc454pasa_LV15  cDNA_match      5533
>  5619    97      -       .       ID=chain_26;Target=GGWHCZS01CINS0
> > 1 89 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      912517
>  912653  97      -       .       ID=chain_27;Target=GGWHCZS01CY8FF
> > 1 137 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      912168
>  912225  91      -       .       ID=chain_27;Target=GGWHCZS01CY8FF
> > 138 195 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909589
>  909659  100     +       .       ID=chain_28;Target=GGWHCZS01CVFMM
> > 1 71 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      154234
>  154314  100     -       .       ID=chain_29;Target=GGWHCZS01B40FY
> > 1 81 +
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      153547
>  153735  100     -       .       ID=chain_29;Target=GGWHCZS01B40FY
> > 82 270 +
> > scaffold36919   alignAssembly-sacc454pasa_LV15  cDNA_match      113299
>  113345  100     +       .       ID=chain_30;Target=GGWHCZS01DBSZR
> > 1 47 +
> > scaffold36953   alignAssembly-sacc454pasa_LV15  cDNA_match      12449
> 12685   97      -       .       ID=chain_31;Target=GGWHCZS01AVC8Y
> > 1 235 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909677
>  909715  100     +       .       ID=chain_32;Target=GGWHCZS01DC0OZ
> > 1 39 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909891
>  909920  100     +       .       ID=chain_32;Target=GGWHCZS01DC0OZ
> > 40 69 -
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      156139
>  156176  100     -       .       ID=chain_33;Target=GGWHCZS01CZ1CL
> > 1 38 -
> > scaffold36419   alignAssembly-sacc454pasa_LV15  cDNA_match      154845
>  154973  96      -       .       ID=chain_33;Target=GGWHCZS01CZ1CL
> > 39 165 -
> > scaffold37919   alignAssembly-sacc454pasa_LV15  cDNA_match      122261
>  122312  96      +       .       ID=chain_34;Target=GGWHCZS01EF1CO
> > 1 51 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909683
>  909715  100     +       .       ID=chain_35;Target=GGWHCZS01C6B5W
> > 1 33 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909891
>  909915  100     +       .       ID=chain_35;Target=GGWHCZS01C6B5W
> > 34 58 -
> > scaffold37910   alignAssembly-sacc454pasa_LV15  cDNA_match      266441
>  266510  98      +       .       ID=chain_36;Target=GGWHCZS01BR14F
> > 1 69 +
> > scaffold38922   alignAssembly-sacc454pasa_LV15  cDNA_match      137914
>  137970  100     -       .       ID=chain_37;Target=GGWHCZS01CPA1V
> > 1 57 +
> > scaffold37917   alignAssembly-sacc454pasa_LV15  cDNA_match      52798
> 52892   97      -       .       ID=chain_38;Target=GGWHCZS01CMW2K
> > 1 93 +
> > scaffold36935   alignAssembly-sacc454pasa_LV15  cDNA_match      22274
> 22398   100     -       .       ID=chain_39;Target=GGWHCZS01C1GEQ
> > 1 125 +
> > scaffold38911   alignAssembly-sacc454pasa_LV15  cDNA_match      143736
>  143833  98      -       .       ID=chain_40;Target=GGWHCZS01CAS23
> > 1 98 +
> > scaffold36433   alignAssembly-sacc454pasa_LV15  cDNA_match      188
> 343     98      +       .       ID=chain_41;Target=GGWHCZS01D4XLL
> > 1 154 +
> > scaffold36926   alignAssembly-sacc454pasa_LV15  cDNA_match      91417
> 91535   100     -       .       ID=chain_42;Target=GGWHCZS01CIJEU
> > 1 119 -
> > scaffold36926   alignAssembly-sacc454pasa_LV15  cDNA_match      90394
> 90498   100     -       .       ID=chain_42;Target=GGWHCZS01CIJEU
> > 120 224 -
> > scaffold38913   alignAssembly-sacc454pasa_LV15  cDNA_match      51431
> 51614   99      +       .       ID=chain_43;Target=GGWHCZS01DAAC8
> > 1 184 +
> > scaffold36414   alignAssembly-sacc454pasa_LV15  cDNA_match      498987
>  499090  98      +       .       ID=chain_44;Target=GGWHCZS01BIZ3D
> > 1 103 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909675
>  909715  100     +       .       ID=chain_45;Target=GGWHCZS01BLTD9
> > 2 42 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909891
>  909913  100     +       .       ID=chain_45;Target=GGWHCZS01BLTD9
> > 43 65 -
> > scaffold38913   alignAssembly-sacc454pasa_LV15  cDNA_match      51490
> 51614   99      +       .       ID=chain_46;Target=GGWHCZS01DCKJA
> > 1 124 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909524
>  909715  98      +       .       ID=chain_47;Target=GGWHCZS01CSVIX
> > 1 192 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909891
>  909920  100     +       .       ID=chain_47;Target=GGWHCZS01CSVIX
> > 193 222 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909602
>  909659  100     +       .       ID=chain_48;Target=GGWHCZS01D0PFQ
> > 1 58 +
> > scaffold38416   alignAssembly-sacc454pasa_LV15  cDNA_match      6762
>  6810    97      +       .       ID=chain_49;Target=GGWHCZS01D0TX7
> > 1 48 +
> > scaffold38416   alignAssembly-sacc454pasa_LV15  cDNA_match      9003
>  9119    96      +       .       ID=chain_49;Target=GGWHCZS01D0TX7
> > 49 165 +
> > scaffold38907   alignAssembly-sacc454pasa_LV15  cDNA_match      833004
>  833038  97      +       .       ID=chain_50;Target=GGWHCZS01CTCD7
> > 1 34 +
> > scaffold38907   alignAssembly-sacc454pasa_LV15  cDNA_match      838060
>  838125  100     +       .       ID=chain_50;Target=GGWHCZS01CTCD7
> > 35 100 +
> > scaffold38907   alignAssembly-sacc454pasa_LV15  cDNA_match      838570
>  838591  90      +       .       ID=chain_50;Target=GGWHCZS01CTCD7
> > 101 122 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909608
>  909720  98      +       .       ID=chain_51;Target=GGWHCZS01B9U4E
> > 1 113 +
> > scaffold37912   alignAssembly-sacc454pasa_LV15  cDNA_match      387939
>  388102  98      +       .       ID=chain_52;Target=GGWHCZS01CC4MR
> > 1 163 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909587
>  909659  100     +       .       ID=chain_53;Target=GGWHCZS01BPMT8
> > 1 73 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      908939
>  909006  98      +       .       ID=chain_54;Target=GGWHCZS01DPL5E
> > 1 68 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909592
>  909715  98      +       .       ID=chain_55;Target=GGWHCZS01BHI27
> > 1 124 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909891
>  909920  100     +       .       ID=chain_55;Target=GGWHCZS01BHI27
> > 125 154 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      908626
>  908682  96      +       .       ID=chain_56;Target=GGWHCZS01ED8IJ
> > 1 57 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909603
>  909659  100     +       .       ID=chain_57;Target=GGWHCZS01BCQYE
> > 1 57 +
> > scaffold36907   alignAssembly-sacc454pasa_LV15  cDNA_match      249216
>  249286  97      +       .       ID=chain_58;Target=GGWHCZS01ATA7N
> > 1 69 +
> > scaffold36414   alignAssembly-sacc454pasa_LV15  cDNA_match      259865
>  259962  100     -       .       ID=chain_59;Target=GGWHCZS01DYGRT
> > 1 98 +
> > scaffold38915   alignAssembly-sacc454pasa_LV15  cDNA_match      116917
>  117033  97      -       .       ID=chain_60;Target=GGWHCZS01BMW8R
> > 1 119 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      911227
>  911253  100     -       .       ID=chain_61;Target=GGWHCZS01CB12C
> > 1 27 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      910741
>  910883  97      -       .       ID=chain_61;Target=GGWHCZS01CB12C
> > 28 170 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      910330
>  910350  100     -       .       ID=chain_61;Target=GGWHCZS01CB12C
> > 171 191 +
> > scaffold36935   alignAssembly-sacc454pasa_LV15  cDNA_match      40763
> 40908   100     -       .       ID=chain_62;Target=GGWHCZS01B2SBA
> > 1 146 +
> > scaffold36935   alignAssembly-sacc454pasa_LV15  cDNA_match      40763
> 40920   100     -       .       ID=chain_63;Target=GGWHCZS01EHM0P
> > 1 158 +
> > scaffold37410   alignAssembly-sacc454pasa_LV15  cDNA_match      611218
>  611305  100     -       .       ID=chain_64;Target=GGWHCZS01BMOMV
> > 1 88 +
> > scaffold38919   alignAssembly-sacc454pasa_LV15  cDNA_match      102147
>  102213  100     +       .       ID=chain_65;Target=GGWHCZS01BVMOL
> > 1 67 -
> > scaffold38919   alignAssembly-sacc454pasa_LV15  cDNA_match      107678
>  107788  98      +       .       ID=chain_65;Target=GGWHCZS01BVMOL
> > 68 178 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909610
>  909715  97      +       .       ID=chain_66;Target=GGWHCZS01C2HAX
> > 1 107 -
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909891
>  909920  100     +       .       ID=chain_66;Target=GGWHCZS01C2HAX
> > 108 137 -
> > scaffold37408   alignAssembly-sacc454pasa_LV15  cDNA_match      16679
> 16777   98      +       .       ID=chain_67;Target=GGWHCZS01BTFN0
> > 1 98 +
> > scaffold38932   alignAssembly-sacc454pasa_LV15  cDNA_match      135072
>  135188  95      -       .       ID=chain_68;Target=GGWHCZS01CURJ7
> > 1 119 +
> > scaffold37907   alignAssembly-sacc454pasa_LV15  cDNA_match      909586
>  909657  100     +       .       ID=chain_69;Target=GGWHCZS01CB9IJ
> > 1 72 +
> > scaffold38911   alignAssembly-sacc454pasa_LV15  cDNA_match      136895
>  136999  100     +       .       ID=chain_70;Target=GGWHCZS01DBI1M
> > 1 105 +
> > scaffold38911   alignAssembly-sacc454pasa_LV15  cDNA_match      138219
>  138319  100     +       .       ID=chain_70;Target=GGWHCZS01DBI1M
> > 106 206 +
> > scaffold38933   alignAssembly-sacc454pasa_LV15  cDNA_match      135829
>  135906  97      -       .       ID=chain_71;Target=GGWHCZS01C0GBU
> > 1 77 +
> > scaffold38933   alignAssembly-sacc454pasa_LV15  cDNA_match      134108
>  134179  95      -       .       ID=chain_71;Target=GGWHCZS01C0GBU
> > 78 149 +
> >
> --
> Alexander (Sasha) Wait Zaranek, PhD
> Research Fellow in Genetics
> Director Informatics
> Personal Genome Project
> Harvard Medical School
>
> http://openwetware.org/wiki/User:Alexander_Wait_Zaranek
>
> _______________________________________________
> GET-dev mailing list
> GET-dev at lists.freelogy.org
> http://lists.freelogy.org/mailman/listinfo/get-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.arvados.org/pipermail/arvados/attachments/20110107/01e86826/attachment.html>


More information about the Arvados mailing list