<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">In general if you want to know if the ab inito algorithms are trained well, look at them in something like apollo. If SNAP and Augustus look like each other, and both look like the final hint based models then they are trained well.<div class=""><br class=""></div><div class="">With AED it's more of a correlative rather than an absolute measurement.  The lower the value, in general the better the model. If you have gold standard models you can get sensitivity and specificity metrics from programs like EVAL from WashU.  But that’s not really an option for newly sequenced organisms.</div><div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 15, 2016, at 2:19 PM, Daniel Ence <<a href="mailto:dence@genetics.utah.edu" class="">dence@genetics.utah.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">

<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">

Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn’t work. The defaults for the parameters you’re asking about are all “0.5”, so half of the exons, splice sites, etc. supported by EST alignment. I think that’s your judgment

 as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. 

<div class=""><br class="">

</div>

<div class="">~Daniel</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

<div class="">Daniel Ence<br class="">

Graduate Student<br class="">

Eccles Institute of Human Genetics<br class="">

University of Utah<br class="">

15 North 2030 East, Room 2100<br class="">

Salt Lake City, UT 84112-5330 </div>

<br class="">

<div class="">

<blockquote type="cite" class="">

<div class="">On Mar 15, 2016, at 2:07 PM, 陈文博 <<a href="mailto:chenwenbo1020@gmail.com" class="">chenwenbo1020@gmail.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div dir="ltr" class="">Hi <span style="font-size:14px" class="">Daniel,</span>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">Thanks for your help.</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">"</span><span style="font-size:14px" class="">In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation

 edit distance) values with the distribution of AED values from your prior MAKER runs"</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED.</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input

 models for training SNAP, any suggestion on the "-c -e -o" parameter?</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">here is my parameter in the CTL file:</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">alt_splice=0</span></div>

<div class=""><span style="font-size:14px" class="">always_complete=1</span></div>

<div class=""><span style="font-size:14px" class="">split_hit=257022</span></div>

<div class=""><span style="font-size:14px" class="">max_dna_len=1700000</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">Thanks a lot!</span></div>

<div class=""><span style="font-size:14px" class=""><br class="">

</span></div>

<div class=""><span style="font-size:14px" class="">Best,</span></div>

<div class=""><span style="font-size:14px" class="">Wenbo</span></div>

<div class=""><span style="font-size:14px" class="">  </span></div>

</div>

<div class="gmail_extra"><br class="">

<div class="gmail_quote">2016-03-14 12:17 GMT-04:00 Daniel Ence <span dir="ltr" class="">

<<a href="mailto:dence@genetics.utah.edu" target="_blank" class="">dence@genetics.utah.edu</a>></span>:<br class="">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome.<br class="">

<br class="">

I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor

 (SNAP).<br class="">

<br class="">

To answer your specific questions:<br class="">

<br class="">

1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago:

<a href="http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html" rel="noreferrer" target="_blank" class="">

http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html</a><br class="">

Augustus provides measures of accuracy, sensitivity, and specificity during it’s training procedures, although I can’t recall exactly where it provides those. I believe that Genemark provides similar reports during it’s own training process. I’m not certain

 about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs.

 I’d be surprised if two rounds of training improved the AED scores much though.<br class="">

<br class="">

2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and

 chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used.<br class="">

<br class="">

3) see above.<br class="">

<br class="">

Let me know if that helps,<br class="">

Daniel<br class="">

<br class="">

<br class="">

Daniel Ence<br class="">

Graduate Student<br class="">

Eccles Institute of Human Genetics<br class="">

University of Utah<br class="">

15 North 2030 East, Room 2100<br class="">

Salt Lake City, UT 84112-5330<br class="">

<div class="">

<div class="h5"><br class="">

> On Mar 13, 2016, at 8:22 PM, 陈文博 <<a href="mailto:chenwenbo1020@gmail.com" class="">chenwenbo1020@gmail.com</a>> wrote:<br class="">

><br class="">

> Hi All,<br class="">

><br class="">

> I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences

 of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are:<br class="">

><br class="">

> 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained?<br class="">

><br class="">

> 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial?<br class="">

><br class="">

> 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it?<br class="">

><br class="">

> Thank you!<br class="">

><br class="">

> Best regards,<br class="">

> Wenbo<br class="">

</div>

</div>

> _______________________________________________<br class="">

> maker-devel mailing list<br class="">

> <a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">

> <a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" rel="noreferrer" target="_blank" class="">

http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br class="">

<br class="">

</blockquote>

</div>

<br class="">

</div>

</div>

</blockquote>

</div>

<br class="">

</div>

</div>

</div>

_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></div></body></html>