[GET-dev] GFF3 to Wig ?
Alexander Wait Zaranek
awaitz at post.harvard.edu
Thu Jan 6 12:31:38 EST 2011
On Tue, Jan 4, 2011 at 7:10 PM, Leon Peshkin <peshkin at gmail.com> wrote:
> Could someone point me to a tool which would allow
> to create a SAM/BAM file from GFF3 and then export in Wig/BED files.
> I thing BAM->BED is possible in bamtools, but how about
> GFF3->BAM and BAM->Wig ?
> I am attaching a chunk of my gff3 file - it is alignment of reads
> against scaffolds.
>
since the PGP gets whole genome and other genomic data from multiple
platforms we convert this data into a simpler format we created for
the PGP. Most of our data was produced by Complete Genomics and is
not in "BAM" format. BAM doesn't have a way to faithfully represent
CGI reads due to format limitations. (The CGI->BAM converters are
lossy, AFAIK.)
The latest version of the CGI pipeline, with sample files, is described here:
* http://www.completegenomics.com/sequence-data/download-data/
Visualizing CGI/Illumina/SOLiD data is definitely an important
project. It would be great if a community member could volunteer to
take it on! Perhaps we could use this mailing list to work out a
"spec" for what needs to be done?
Thanks,
Sasha
>
> scaffold38913 alignAssembly-sacc454pasa_LV15 cDNA_match 51483 51536 96 + . ID=chain_1;Target=GGWHCZS01CVNEB
> 1 52 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 181791 182015 100 + . ID=chain_2;Target=GGWHCZS01B8DBA
> 8 232 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 182349 182461 98 + . ID=chain_2;Target=GGWHCZS01B8DBA
> 233 345 +
> scaffold36953 alignAssembly-sacc454pasa_LV15 cDNA_match 12449 12685 97 - . ID=chain_3;Target=GGWHCZS01BYL46
> 1 241 +
> scaffold38915 alignAssembly-sacc454pasa_LV15 cDNA_match 143091 143273 100 + . ID=chain_4;Target=GGWHCZS01B68J0
> 1 183 +
> scaffold38408 alignAssembly-sacc454pasa_LV15 cDNA_match 17300 17392 96 - . ID=chain_5;Target=GGWHCZS01D8PA1
> 1 92 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 153548 153684 99 - . ID=chain_6;Target=GGWHCZS01BUD6C
> 8 143 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909581 909657 100 + . ID=chain_7;Target=GGWHCZS01B8A85
> 1 77 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 154234 154314 95 - . ID=chain_8;Target=GGWHCZS01EMFO0
> 1 83 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 153545 153735 98 - . ID=chain_8;Target=GGWHCZS01EMFO0
> 84 275 +
> scaffold36409 alignAssembly-sacc454pasa_LV15 cDNA_match 261048 261102 100 + . ID=chain_9;Target=GGWHCZS01EUQ0J
> 1 55 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909591 909657 100 + . ID=chain_10;Target=GGWHCZS01DNMTI
> 1 67 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909602 909657 100 + . ID=chain_11;Target=GGWHCZS01CRCLW
> 1 56 +
> scaffold38911 alignAssembly-sacc454pasa_LV15 cDNA_match 143736 143833 98 - . ID=chain_12;Target=GGWHCZS01D1K2W
> 1 98 +
> scaffold37916 alignAssembly-sacc454pasa_LV15 cDNA_match 81940 81992 100 + . ID=chain_13;Target=GGWHCZS01C6JD5
> 1 53 +
> scaffold37916 alignAssembly-sacc454pasa_LV15 cDNA_match 82836 82926 100 + . ID=chain_13;Target=GGWHCZS01C6JD5
> 54 144 +
> scaffold38933 alignAssembly-sacc454pasa_LV15 cDNA_match 135829 135927 98 - . ID=chain_14;Target=GGWHCZS01CQ9ZX
> 1 99 +
> scaffold38933 alignAssembly-sacc454pasa_LV15 cDNA_match 134108 134179 95 - . ID=chain_14;Target=GGWHCZS01CQ9ZX
> 100 171 +
> scaffold36910 alignAssembly-sacc454pasa_LV15 cDNA_match 94607 94679 97 - . ID=chain_15;Target=GGWHCZS01CFNTH
> 1 71 +
> scaffold36910 alignAssembly-sacc454pasa_LV15 cDNA_match 92066 92099 100 - . ID=chain_15;Target=GGWHCZS01CFNTH
> 72 105 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 154234 154314 97 - . ID=chain_16;Target=GGWHCZS01DS3UM
> 1 81 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 153545 153735 100 - . ID=chain_16;Target=GGWHCZS01DS3UM
> 82 272 +
> scaffold36953 alignAssembly-sacc454pasa_LV15 cDNA_match 12449 12683 97 - . ID=chain_17;Target=GGWHCZS01CTGNU
> 1 239 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909592 909664 100 + . ID=chain_18;Target=GGWHCZS01CBNFR
> 1 73 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 154234 154314 98 - . ID=chain_19;Target=GGWHCZS01CR372
> 1 81 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 153545 153735 100 - . ID=chain_19;Target=GGWHCZS01CR372
> 82 272 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909588 909667 97 + . ID=chain_20;Target=GGWHCZS01DOL8E
> 1 80 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 182394 182451 98 + . ID=chain_21;Target=GGWHCZS01E37KQ
> 1 58 +
> scaffold36407 alignAssembly-sacc454pasa_LV15 cDNA_match 418186 418221 100 + . ID=chain_22;Target=GGWHCZS01A7VUK
> 1 36 +
> scaffold36407 alignAssembly-sacc454pasa_LV15 cDNA_match 419038 419166 100 + . ID=chain_22;Target=GGWHCZS01A7VUK
> 37 165 +
> scaffold36407 alignAssembly-sacc454pasa_LV15 cDNA_match 419794 419868 100 + . ID=chain_22;Target=GGWHCZS01A7VUK
> 166 240 +
> scaffold36421 alignAssembly-sacc454pasa_LV15 cDNA_match 118515 118699 99 - . ID=chain_23;Target=GGWHCZS01CJEXS
> 1 184 +
> scaffold37410 alignAssembly-sacc454pasa_LV15 cDNA_match 361706 361809 100 - . ID=chain_24;Target=GGWHCZS01BB26P
> 1 104 +
> scaffold37410 alignAssembly-sacc454pasa_LV15 cDNA_match 360904 360952 100 - . ID=chain_24;Target=GGWHCZS01BB26P
> 105 153 +
> scaffold37910 alignAssembly-sacc454pasa_LV15 cDNA_match 82980 83092 100 + . ID=chain_25;Target=GGWHCZS01DHIW0
> 1 113 -
> scaffold37910 alignAssembly-sacc454pasa_LV15 cDNA_match 83537 83584 94 + . ID=chain_25;Target=GGWHCZS01DHIW0
> 114 163 -
> scaffold36430 alignAssembly-sacc454pasa_LV15 cDNA_match 5533 5619 97 - . ID=chain_26;Target=GGWHCZS01CINS0
> 1 89 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 912517 912653 97 - . ID=chain_27;Target=GGWHCZS01CY8FF
> 1 137 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 912168 912225 91 - . ID=chain_27;Target=GGWHCZS01CY8FF
> 138 195 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909589 909659 100 + . ID=chain_28;Target=GGWHCZS01CVFMM
> 1 71 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 154234 154314 100 - . ID=chain_29;Target=GGWHCZS01B40FY
> 1 81 +
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 153547 153735 100 - . ID=chain_29;Target=GGWHCZS01B40FY
> 82 270 +
> scaffold36919 alignAssembly-sacc454pasa_LV15 cDNA_match 113299 113345 100 + . ID=chain_30;Target=GGWHCZS01DBSZR
> 1 47 +
> scaffold36953 alignAssembly-sacc454pasa_LV15 cDNA_match 12449 12685 97 - . ID=chain_31;Target=GGWHCZS01AVC8Y
> 1 235 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909677 909715 100 + . ID=chain_32;Target=GGWHCZS01DC0OZ
> 1 39 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909891 909920 100 + . ID=chain_32;Target=GGWHCZS01DC0OZ
> 40 69 -
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 156139 156176 100 - . ID=chain_33;Target=GGWHCZS01CZ1CL
> 1 38 -
> scaffold36419 alignAssembly-sacc454pasa_LV15 cDNA_match 154845 154973 96 - . ID=chain_33;Target=GGWHCZS01CZ1CL
> 39 165 -
> scaffold37919 alignAssembly-sacc454pasa_LV15 cDNA_match 122261 122312 96 + . ID=chain_34;Target=GGWHCZS01EF1CO
> 1 51 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909683 909715 100 + . ID=chain_35;Target=GGWHCZS01C6B5W
> 1 33 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909891 909915 100 + . ID=chain_35;Target=GGWHCZS01C6B5W
> 34 58 -
> scaffold37910 alignAssembly-sacc454pasa_LV15 cDNA_match 266441 266510 98 + . ID=chain_36;Target=GGWHCZS01BR14F
> 1 69 +
> scaffold38922 alignAssembly-sacc454pasa_LV15 cDNA_match 137914 137970 100 - . ID=chain_37;Target=GGWHCZS01CPA1V
> 1 57 +
> scaffold37917 alignAssembly-sacc454pasa_LV15 cDNA_match 52798 52892 97 - . ID=chain_38;Target=GGWHCZS01CMW2K
> 1 93 +
> scaffold36935 alignAssembly-sacc454pasa_LV15 cDNA_match 22274 22398 100 - . ID=chain_39;Target=GGWHCZS01C1GEQ
> 1 125 +
> scaffold38911 alignAssembly-sacc454pasa_LV15 cDNA_match 143736 143833 98 - . ID=chain_40;Target=GGWHCZS01CAS23
> 1 98 +
> scaffold36433 alignAssembly-sacc454pasa_LV15 cDNA_match 188 343 98 + . ID=chain_41;Target=GGWHCZS01D4XLL
> 1 154 +
> scaffold36926 alignAssembly-sacc454pasa_LV15 cDNA_match 91417 91535 100 - . ID=chain_42;Target=GGWHCZS01CIJEU
> 1 119 -
> scaffold36926 alignAssembly-sacc454pasa_LV15 cDNA_match 90394 90498 100 - . ID=chain_42;Target=GGWHCZS01CIJEU
> 120 224 -
> scaffold38913 alignAssembly-sacc454pasa_LV15 cDNA_match 51431 51614 99 + . ID=chain_43;Target=GGWHCZS01DAAC8
> 1 184 +
> scaffold36414 alignAssembly-sacc454pasa_LV15 cDNA_match 498987 499090 98 + . ID=chain_44;Target=GGWHCZS01BIZ3D
> 1 103 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909675 909715 100 + . ID=chain_45;Target=GGWHCZS01BLTD9
> 2 42 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909891 909913 100 + . ID=chain_45;Target=GGWHCZS01BLTD9
> 43 65 -
> scaffold38913 alignAssembly-sacc454pasa_LV15 cDNA_match 51490 51614 99 + . ID=chain_46;Target=GGWHCZS01DCKJA
> 1 124 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909524 909715 98 + . ID=chain_47;Target=GGWHCZS01CSVIX
> 1 192 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909891 909920 100 + . ID=chain_47;Target=GGWHCZS01CSVIX
> 193 222 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909602 909659 100 + . ID=chain_48;Target=GGWHCZS01D0PFQ
> 1 58 +
> scaffold38416 alignAssembly-sacc454pasa_LV15 cDNA_match 6762 6810 97 + . ID=chain_49;Target=GGWHCZS01D0TX7
> 1 48 +
> scaffold38416 alignAssembly-sacc454pasa_LV15 cDNA_match 9003 9119 96 + . ID=chain_49;Target=GGWHCZS01D0TX7
> 49 165 +
> scaffold38907 alignAssembly-sacc454pasa_LV15 cDNA_match 833004 833038 97 + . ID=chain_50;Target=GGWHCZS01CTCD7
> 1 34 +
> scaffold38907 alignAssembly-sacc454pasa_LV15 cDNA_match 838060 838125 100 + . ID=chain_50;Target=GGWHCZS01CTCD7
> 35 100 +
> scaffold38907 alignAssembly-sacc454pasa_LV15 cDNA_match 838570 838591 90 + . ID=chain_50;Target=GGWHCZS01CTCD7
> 101 122 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909608 909720 98 + . ID=chain_51;Target=GGWHCZS01B9U4E
> 1 113 +
> scaffold37912 alignAssembly-sacc454pasa_LV15 cDNA_match 387939 388102 98 + . ID=chain_52;Target=GGWHCZS01CC4MR
> 1 163 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909587 909659 100 + . ID=chain_53;Target=GGWHCZS01BPMT8
> 1 73 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 908939 909006 98 + . ID=chain_54;Target=GGWHCZS01DPL5E
> 1 68 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909592 909715 98 + . ID=chain_55;Target=GGWHCZS01BHI27
> 1 124 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909891 909920 100 + . ID=chain_55;Target=GGWHCZS01BHI27
> 125 154 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 908626 908682 96 + . ID=chain_56;Target=GGWHCZS01ED8IJ
> 1 57 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909603 909659 100 + . ID=chain_57;Target=GGWHCZS01BCQYE
> 1 57 +
> scaffold36907 alignAssembly-sacc454pasa_LV15 cDNA_match 249216 249286 97 + . ID=chain_58;Target=GGWHCZS01ATA7N
> 1 69 +
> scaffold36414 alignAssembly-sacc454pasa_LV15 cDNA_match 259865 259962 100 - . ID=chain_59;Target=GGWHCZS01DYGRT
> 1 98 +
> scaffold38915 alignAssembly-sacc454pasa_LV15 cDNA_match 116917 117033 97 - . ID=chain_60;Target=GGWHCZS01BMW8R
> 1 119 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 911227 911253 100 - . ID=chain_61;Target=GGWHCZS01CB12C
> 1 27 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 910741 910883 97 - . ID=chain_61;Target=GGWHCZS01CB12C
> 28 170 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 910330 910350 100 - . ID=chain_61;Target=GGWHCZS01CB12C
> 171 191 +
> scaffold36935 alignAssembly-sacc454pasa_LV15 cDNA_match 40763 40908 100 - . ID=chain_62;Target=GGWHCZS01B2SBA
> 1 146 +
> scaffold36935 alignAssembly-sacc454pasa_LV15 cDNA_match 40763 40920 100 - . ID=chain_63;Target=GGWHCZS01EHM0P
> 1 158 +
> scaffold37410 alignAssembly-sacc454pasa_LV15 cDNA_match 611218 611305 100 - . ID=chain_64;Target=GGWHCZS01BMOMV
> 1 88 +
> scaffold38919 alignAssembly-sacc454pasa_LV15 cDNA_match 102147 102213 100 + . ID=chain_65;Target=GGWHCZS01BVMOL
> 1 67 -
> scaffold38919 alignAssembly-sacc454pasa_LV15 cDNA_match 107678 107788 98 + . ID=chain_65;Target=GGWHCZS01BVMOL
> 68 178 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909610 909715 97 + . ID=chain_66;Target=GGWHCZS01C2HAX
> 1 107 -
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909891 909920 100 + . ID=chain_66;Target=GGWHCZS01C2HAX
> 108 137 -
> scaffold37408 alignAssembly-sacc454pasa_LV15 cDNA_match 16679 16777 98 + . ID=chain_67;Target=GGWHCZS01BTFN0
> 1 98 +
> scaffold38932 alignAssembly-sacc454pasa_LV15 cDNA_match 135072 135188 95 - . ID=chain_68;Target=GGWHCZS01CURJ7
> 1 119 +
> scaffold37907 alignAssembly-sacc454pasa_LV15 cDNA_match 909586 909657 100 + . ID=chain_69;Target=GGWHCZS01CB9IJ
> 1 72 +
> scaffold38911 alignAssembly-sacc454pasa_LV15 cDNA_match 136895 136999 100 + . ID=chain_70;Target=GGWHCZS01DBI1M
> 1 105 +
> scaffold38911 alignAssembly-sacc454pasa_LV15 cDNA_match 138219 138319 100 + . ID=chain_70;Target=GGWHCZS01DBI1M
> 106 206 +
> scaffold38933 alignAssembly-sacc454pasa_LV15 cDNA_match 135829 135906 97 - . ID=chain_71;Target=GGWHCZS01C0GBU
> 1 77 +
> scaffold38933 alignAssembly-sacc454pasa_LV15 cDNA_match 134108 134179 95 - . ID=chain_71;Target=GGWHCZS01C0GBU
> 78 149 +
>
--
Alexander (Sasha) Wait Zaranek, PhD
Research Fellow in Genetics
Director Informatics
Personal Genome Project
Harvard Medical School
http://openwetware.org/wiki/User:Alexander_Wait_Zaranek
More information about the Arvados
mailing list