<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Thanks for this. Just some info on how MAKER sets the start/stop codon. MAKER will maintain the reading frame and codons of the gene predictor, so if a longer ORF exists in another reading frame, MAKER will not switch to it (this is because the gene predictor is saying the other reading frame is less probable). MAKER can extend the ORF in the same reading frame, but only if the initial prediction is partial, mRNA alignment suggests an extension, and the ORF extends to a canonical start in the same frame as the prediction. MAKER can also truncate the ORF if the gene prediction is partial to begin with and there is mRNA evidence of UTR (this can indicate an assembly error in the gene that artificially splits the ORF - or also false merge of neighboring genes through bad mRNA-seq assembly). The always_complete options adds one extra step that is not necessarily biologically correct, but does help make genes more canonical. When set, maker will walk off the edge of the CDS without (mRNA evidence) in both directions and extend to the first canonical start or canonical stop that it encounters. It’s beneficial when using protein2genome alignments for homology based annotation since the alignments can be fuzzy near the edges.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Carson</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Mar 25, 2021, at 3:03 PM, Jacques Dainat <<a href="mailto:jacques.dainat@nbis.se" class="">jacques.dainat@nbis.se</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">I met this problem in some projects where the ORFs were not well defined. In the mRNA it was not the longest ORF chosen, which is not necessarily wrong but here it was obviously not the correct one chosen. Probably due to bad training of my abinitio tools. </div><div class="">I ended up to develop a script to fix the predictions and use the longest ORF as CDS. The script is called <a class="js-navigation-open Link--primary" title="agat_sp_fix_longest_ORF.pl" data-pjax="#repo-content-pjax-container" href="https://github.com/NBISweden/AGAT/blob/master/bin/agat_sp_fix_longest_ORF.pl" style="box-sizing: border-box; text-decoration: none; caret-color: rgb(234, 247, 255); font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 14.000000953674316px; white-space: nowrap; color: var(--color-text-primary) !important;">agat_sp_fix_longest_ORF.pl</a> </div><div class="">It is available within AGAT (<a href="https://github.com/NBISweden/AGAT" class="">https://github.com/NBISweden/AGAT</a>)</div><div class=""><br class=""></div><div class="">Hoping it could help,</div><div class=""><br class=""></div><div class="">Best regards,</div><div class=""><br class=""></div><div class=""><div class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Jacques Dainat, Ph.D.<br class=""><br class=""></div></div></div>
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On 8 Mar 2021, at 11:25, 廖家緯 <<a href="mailto:jwliao25@gmail.com" class="">jwliao25@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">Hi maker-devel group,<div class=""><br class=""></div><div class="">I used the maker with SNAP and Augustus for annotating a green algae genome. I always set the 'always_complete=1' from the first round of annotation.</div><div class=""><br class=""></div><div class="">After Augustus training, I still get around 1111 and 208 genes that don't have the correct start and stop codon. (Total annotated gene number is 12696)</div><div class=""><br class=""></div><div class="">I provided the close species proteome and the green algae its own RNA-seq data for EST hint.</div><div class=""><br class=""></div><div class="">Does that make sense? Does there have any way to improve the result or fix the incorrect start and stop codon for those gene?</div><div class=""><br class=""></div><div class="">best,</div><div class="">Chai-Wei Liao</div></div><br clear="all" class=""><div class=""><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr" class=""><p class="MsoNormal" style="color:rgb(34,34,34)"><font face="arial, sans-serif" class="">--------------------------------------------------</font></p><p class="MsoNormal" style="color:rgb(34,34,34)"><font face="arial, sans-serif" class="">Chia-Wei Liao 廖家緯</font></p><p class="MsoNormal" style="color:rgb(34,34,34)"><font face="arial, sans-serif" class="">Research Assistant</font></p><p class="MsoNormal" style="color:rgb(34,34,34)"><font face="arial, sans-serif" class="">Institute of Molecular Biology,<u class=""></u><u class=""></u></font></p><p class="MsoNormal" style="color:rgb(34,34,34)"><span style="background-position: initial initial; background-repeat: initial initial;" class=""><font face="arial, sans-serif" class="">Academia Sinica, Taipei City, Taiwan<u class=""></u><u class=""></u></font></span></p><p class="MsoNormal" style="color:rgb(34,34,34)"><span style="background-position: initial initial; background-repeat: initial initial;" class=""><font face="arial, sans-serif" class="">Phone: 886-2-2789-9216 (Lab)</font></span></p></div></div></div></div>
_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@yandell-lab.org" class="">maker-devel@yandell-lab.org</a><br class=""><a href="http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org" class="">http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org</a><br class=""></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@yandell-lab.org" class="">maker-devel@yandell-lab.org</a><br class="">http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></body></html>