[maker-devel] CIGAR string explanation

Jacques Dainat jacques.dainat at nbis.se
Tue Oct 23 07:56:09 MDT 2018


Hello,

Here an example of the cigar string output from exonerate (exactly the same command as launched by MAKER)

cigar: P46461.1 3 740 . genome 460484 439594 - 2580  M 84 I 1 D 56 M 154 I 3 M 54 D 1554 M 145 D 3346 M 137 D 120 M 160 D 197 M 182 D 145 M 165 D 415 M 170 D 5037 M 321 D 124 M 158 D 116 M 183 D 1819 M 157 D 5776 M 115
vulgar: P46461.1 3 740 . genome 460484 439594 - 2580 M 28 84 G 1 0 S 0 2 5 0 2 I 0 50 3 0 2 S 1 1 M 51 153 G 3 0 M 18 54 S 0 2 5 0 2 I 0 1548 3 0 2 S 1 1 M 48 144 S 0 1 5 0 2 I 0 3341 3 0 2 S 1 2 M 45 135 S 0 2 5 0 2 I 0 114 3 0 2 S 1 1 M 53 159 S 0 1 5 0 2 I 0 192 3 0 2 S 1 2 M 60 180 5 0 2$
-- completed exonerate analysis


and here the result we get in the protein2genome.gff output from MAKER

@000426F|arrow|arrow    protein2genome  protein_match   439595  460484  2580    -       .       ID=@000426F|arrow|arrow:hit:153696:3.10.0.4;Name=P46461.1;target_length=745;aligned_coverage=98.93;aligned_identity=72.6
@000426F|arrow|arrow    protein2genome  match_part      460399  460484  2580    -       .       ID=@000426F|arrow|arrow:hsp:233933:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 4 32;Gap=F2 I1 M28
@000426F|arrow|arrow    protein2genome  match_part      460135  460344  2580    -       .       ID=@000426F|arrow|arrow:hsp:233934:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 33 105;Gap=F2 M18 I3 M52 R2
@000426F|arrow|arrow    protein2genome  match_part      458437  458582  2580    -       .       ID=@000426F|arrow|arrow:hsp:233935:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 106 154;Gap=F1 M49 R2
@000426F|arrow|arrow    protein2genome  match_part      454953  455091  2580    -       .       ID=@000426F|arrow|arrow:hsp:233936:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 155 200;Gap=F2 M46 R1
@000426F|arrow|arrow    protein2genome  match_part      454674  454834  2580    -       .       ID=@000426F|arrow|arrow:hsp:233937:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 201 254;Gap=F1 M54 R2
@000426F|arrow|arrow    protein2genome  match_part      454296  454477  2580    -       .       ID=@000426F|arrow|arrow:hsp:233938:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 255 315;Gap=M61 R1
@000426F|arrow|arrow    protein2genome  match_part      453985  454150  2580    -       .       ID=@000426F|arrow|arrow:hsp:233939:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 316 370;Gap=F1 M55
@000426F|arrow|arrow    protein2genome  match_part      453401  453570  2580    -       .       ID=@000426F|arrow|arrow:hsp:233940:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 371 427;Gap=M57 R1
@000426F|arrow|arrow    protein2genome  match_part      448042  448363  2580    -       .       ID=@000426F|arrow|arrow:hsp:233941:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 428 534;Gap=F1 M107
@000426F|arrow|arrow    protein2genome  match_part      447761  447918  2580    -       .       ID=@000426F|arrow|arrow:hsp:233942:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 535 587;Gap=M53 R1
@000426F|arrow|arrow    protein2genome  match_part      447460  447644  2580    -       .       ID=@000426F|arrow|arrow:hsp:233943:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 588 648;Gap=F2 M61
@000426F|arrow|arrow    protein2genome  match_part      445484  445642  2580    -       .       ID=@000426F|arrow|arrow:hsp:233944:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 649 701;Gap=F2 M53 R2
@000426F|arrow|arrow    protein2genome  match_part      439595  439709  2580    -       .       ID=@000426F|arrow|arrow:hsp:233945:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 702 740;Gap=M39 R2

MAKER apparently process the CIGAR string and save it into the Gap attribute. The value looks like CIGAR string but it is different. Here is the different letters we can find (M, D, I, R, F). I guess M=match, D=deletion and I=insertion, but I don’t get the meaning of the R and F.
Could you explain their meanings ?

Best regards,

/Jacques
-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service
http://nbis.se/about/staff/jacques-dainat <http://nbis.se/about/staff/jacques-dainat/>
http://nbis.se <http://nbis.se/>

—	Contact	— 
Address: Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: +46 18 471 46 25

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20181023/86a64f39/attachment-0001.html>


More information about the maker-devel mailing list