<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">EVM works extremely well when evidence closely matches the predictions and there are no assembly anomalies affecting ORF. Otherwise, EVM performs very very poorly. Also I would not set unmask=1. It adds noise to the calls.<div class=""><br class=""></div><div class="">Note in all cases given, gene models are from Augustus (MAKER doesn’t make predictions). MAKER just provides hints that Augustus can use for the second call set. Hints boost the score a model gets whenever a feature matches the hint. What you see as an Augustus match/match_part feature are just references of what Augustus calls without hints.</div><div class=""><br class=""></div><div class="">So if I tell Augustus there is probably an exon/intron at location X, then any model that includes that exon/intron will bump up its score thus causing Augustus to keep models that match the hints and report those over models that don’t match. However if there is an issue with the evidence (i.e. merge mRNA-seq assembly), or an issue with the assembly (base change generates an early stop codon or causes a frameshift), then Augustus may choose to truncate or skip an exon in order to capture the bonus from downstream hints. So it is unlikely that there is a workable model that capture the exact intron exon structure because it breaks the ORF at some point. So Augustus instead produces the best model it can to capture as many hint bonuses as it can.</div><div class=""><br class=""></div><div class="">That being said, look for any odd hint sources like very poor protein or transcript evidence alignments. Eliminating bad hints will improve performance (if using mRNA-seq assemblies Trinity has a jaccard_clip option which helps avoid false merging of transcript evidence for example). Or if an organism you used for protein evidence constantly produces bad protein alignments, then you may want to drop it completely from evidence.</div><div class=""><br class=""></div><div class="">Finally training Augustus on the genome being annotated will help improve performance (note just because a species is closely related in evolutionary space does not mean that its HMM's will perform well; it’s a common fallacy about ab initio prediction discussed in the SNAP paper). Also try adding another gene predictor like SNAP to see if it hurts or helps.</div><div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On May 21, 2017, at 1:48 AM, Salim Bougouffa <<a href="mailto:mjfi2sb3@gmail.com" class="">mjfi2sb3@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">Hi<span class="inbox-inbox-Apple-converted-space"> </span><span class="inbox-inbox-il">Maker</span><span class="inbox-inbox-Apple-converted-space"> </span>folks,</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)"><br class=""></div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">I have several issues with a plant genome annotation that I am currently doing but perhaps the most recurrent issues are:</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)"><br class=""></div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">1/ CDSs that are missed where significant rna-seq evidence is there (figure artemis01)</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">2/ vice versa where one or two exons are added without rna-seq evidence/intron hints (figure artemis02)<br class=""></div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)"><br class=""></div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">info about the runs:</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">1/ using augustus with a pre-existing model for a related plant that has high homology to the one I am annotating</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">2/ umask=1 (seems to do better than umask=0; is this a good thing to do)</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">3/ evm = 1 (seems to perform better than emv=0)</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">4/ repeatmasking (denovo + repbase)</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)"><br class=""></div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">Best,</div><div class="gmail_default" style="font-family:"trebuchet ms",sans-serif;color:rgb(153,0,255)">/SB</div><br class="inbox-inbox-Apple-interchange-newline"></div>
<span id="cid:15c29fada6279f8bdb11"><artemis01.png></span><span id="cid:15c29fada8c7a06d5322"><artemis02.png></span>_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></body></html>