[maker-devel] How to evaluate the results of gene prediction

Mon Mar 14 10:17:31 MDT 2016

Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. 

I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). 

To answer your specific questions:

1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html
Augustus provides measures of accuracy, sensitivity, and specificity during it’s training procedures, although I can’t recall exactly where it provides those. I believe that Genemark provides similar reports during it’s own training process. I’m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I’d be surprised if two rounds of training improved the AED scores much though. 

2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 

3) see above. 

Let me know if that helps, 
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

> On Mar 13, 2016, at 8:22 PM, 陈文博 <chenwenbo1020 at gmail.com> wrote:
> 
> Hi All,
> 
> I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are:
> 
> 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained?
> 
> 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? 
> 
> 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it?
> 
> Thank you!
> 
> Best regards,
> Wenbo
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org