<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Once upon a time the link in the official GFF3 specification to the cigar string documentation actually worked and it would bring you to a nice page that explained everything. It described how the F and R were to be used on protein space space alignments (F is forward frame shift and R is a reverse frame shift in the alignment). M1 in a protein space is actually an amino acid match (matches 3 bp in nucleotide space), this was previously clear in the now broken link. At the same time I1 is an amino acid insertion (3bp in nucleotide space), and D1 is an amino acid deletion (3bp in nucleotide space). F and R therefore allow for single bp movement either to the left or right within amino acid space. Sometime this happens in Exonerate where it appears as a slightly shifted codon (codons look stacked ), but it also happens when an amino acid is split across a splice site (1st part of a codon is on one exon and second part on the next exon). The raw exonerate cigar you show below doesn’t have this because it’s only half the cigar and it’s in nucleotide space, the value shown in the Gap= has to be in the same space as the Target= feature, which in this case is a protein. So we build the protein cigar string from the vulgar string according to the now broken documentation on Gap attributes. You have 28 amino acid matches, 1 insertion, and then an amino acid split across the intron (1bp of the codon on one side and 2bp on the other side), and it’s flipped because the alignment happens on the opposite strand.<div class=""><div class=""><div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Oct 23, 2018, at 7:56 AM, Jacques Dainat <<a href="mailto:jacques.dainat@nbis.se" class="">jacques.dainat@nbis.se</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Hello,</div><div class=""><br class=""></div><div class="">Here an example of the cigar string output from exonerate (exactly the same command as launched by MAKER)</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(255, 255, 255); background-color: rgb(43, 102, 201);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">cigar: P46461.1 3 740 . genome 460484 439594 - 2580 M 84 I 1 D 56 M 154 I 3 M 54 D 1554 M 145 D 3346 M 137 D 120 M 160 D 197 M 182 D 145 M 165 D 415 M 170 D 5037 M 321 D 124 M 158 D 116 M 183 D 1819 M 157 D 5776 M 115</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(255, 255, 255); background-color: rgb(43, 102, 201);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">vulgar: P46461.1 3 740 . genome 460484 439594 - 2580 M 28 84 G 1 0 S 0 2 5 0 2 I 0 50 3 0 2 S 1 1 M 51 153 G 3 0 M 18 54 S 0 2 5 0 2 I 0 1548 3 0 2 S 1 1 M 48 144 S 0 1 5 0 2 I 0 3341 3 0 2 S 1 2 M 45 135 S 0 2 5 0 2 I 0 114 3 0 2 S 1 1 M 53 159 S 0 1 5 0 2 I 0 192 3 0 2 S 1 2 M 60 180 5 0 2$</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal; font-family: Menlo; color: rgb(255, 255, 255); background-color: rgb(43, 102, 201);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">-- completed exonerate analysis</span></div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">and here the result we get in the protein2genome.gff output from MAKER</div><div class=""><br class=""></div><div class=""><div class="">@000426F|arrow|arrow protein2genome protein_match 439595 460484 2580 - . ID=@000426F|arrow|arrow:hit:153696:3.10.0.4;Name=P46461.1;target_length=745;aligned_coverage=98.93;aligned_identity=72.6</div><div class="">@000426F|arrow|arrow protein2genome match_part 460399 460484 2580 - . ID=@000426F|arrow|arrow:hsp:233933:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 4 32;Gap=F2 I1 M28</div><div class="">@000426F|arrow|arrow protein2genome match_part 460135 460344 2580 - . ID=@000426F|arrow|arrow:hsp:233934:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 33 105;Gap=F2 M18 I3 M52 R2</div><div class="">@000426F|arrow|arrow protein2genome match_part 458437 458582 2580 - . ID=@000426F|arrow|arrow:hsp:233935:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 106 154;Gap=F1 M49 R2</div><div class="">@000426F|arrow|arrow protein2genome match_part 454953 455091 2580 - . ID=@000426F|arrow|arrow:hsp:233936:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 155 200;Gap=F2 M46 R1</div><div class="">@000426F|arrow|arrow protein2genome match_part 454674 454834 2580 - . ID=@000426F|arrow|arrow:hsp:233937:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 201 254;Gap=F1 M54 R2</div><div class="">@000426F|arrow|arrow protein2genome match_part 454296 454477 2580 - . ID=@000426F|arrow|arrow:hsp:233938:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 255 315;Gap=M61 R1</div><div class="">@000426F|arrow|arrow protein2genome match_part 453985 454150 2580 - . ID=@000426F|arrow|arrow:hsp:233939:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 316 370;Gap=F1 M55</div><div class="">@000426F|arrow|arrow protein2genome match_part 453401 453570 2580 - . ID=@000426F|arrow|arrow:hsp:233940:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 371 427;Gap=M57 R1</div><div class="">@000426F|arrow|arrow protein2genome match_part 448042 448363 2580 - . ID=@000426F|arrow|arrow:hsp:233941:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 428 534;Gap=F1 M107</div><div class="">@000426F|arrow|arrow protein2genome match_part 447761 447918 2580 - . ID=@000426F|arrow|arrow:hsp:233942:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 535 587;Gap=M53 R1</div><div class="">@000426F|arrow|arrow protein2genome match_part 447460 447644 2580 - . ID=@000426F|arrow|arrow:hsp:233943:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 588 648;Gap=F2 M61</div><div class="">@000426F|arrow|arrow protein2genome match_part 445484 445642 2580 - . ID=@000426F|arrow|arrow:hsp:233944:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 649 701;Gap=F2 M53 R2</div><div class="">@000426F|arrow|arrow protein2genome match_part 439595 439709 2580 - . ID=@000426F|arrow|arrow:hsp:233945:3.10.0.4;Parent=@000426F|arrow|arrow:hit:153696:3.10.0.4;Target=P46461.1 702 740;Gap=M39 R2</div></div><div class=""><br class=""></div><div class="">MAKER apparently process the CIGAR string and save it into the Gap attribute. The value looks like CIGAR string but it is different. Here is the different letters we can find (M, D, I, R, F). I guess M=match, D=deletion and I=insertion, but I don’t get the meaning of the R and F.</div><div class="">Could you explain their meanings ?</div><div class=""><br class=""></div>Best regards,<div class=""><br class=""></div><div class="">/Jacques<br class=""><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">-------------------------------------------------<br class="">Jacques Dainat, Ph.D.<br class="">NBIS (National Bioinformatics Infrastructure Sweden)<br class="">Genome Annotation Service<br class=""><a href="http://nbis.se/about/staff/jacques-dainat/" class="">http://nbis.se/about/staff/jacques-dainat</a></div><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><a href="http://nbis.se/" class="">http://nbis.se</a></div><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class="">—<span class="Apple-tab-span" style="white-space: pre;"> </span><b class="">Contact</b><span class="Apple-tab-span" style="white-space: pre;"> </span>— <br class=""><b class="">Address</b>: Uppsala University, Biomedicinska Centrum<br class="">Department of Medical Biochemistry Microbiology, Genomics<br class="">Husargatan 3, box 582<br class="">S-75123 Uppsala Sweden<br class=""><b class="">Phone</b>: +46 18 471 46 25</div></div></div>
</div>
<br class=""></div></div>_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></div></div></body></html>