[maker-devel] Some questions regarding ab-initio training

Marc Höppner marc.hoeppner at bils.se
Tue May 27 02:12:07 MDT 2014


Hi,

I wanted to get some feedback regarding the training of ab-initio gene finders - it’s not strictly Maker related, but I suppose there are many people on this list that have encountered and solved this issue in one way or another.

Specifically, I am trying to train Augustus (and possibly SNAP) for a plant genome. This has always been a very frustrating process for me, but while I have a better idea now how to do it, I still don’t get the sort of accuracy that I am hoping for. A quick run-through of my process;

Evidence build with maker on level 1 and 2 proteins from Uniprot + Sanger-sequenced EST data

Filtered for Models with an AED <= 0.3

Loaded that into WebApollo, together with an existing reference annotation and the evidence tracks

Manually curated/selected 750 gene models using the following rules:
- Must have start/stop codon
- Most have as many exons as possible
- Must agree with evidence 
- Must be >= 2kb part from other gene models (provided as flanking regions for augustus to train intergenic sequence)

From these models, I created  a GBK file, split it into 650 (train) and 100 (test) models and created a new profile using the documented procedure.

But:

While the naked ab-init models created through maker get a lot of genes ‘sort of right’, I still see too many issues to be really satisfied. Problems include:

- random exon calls which are not supported by any line of evidence (~1 per gene model, I would guess)
- poor congruency with some gene models (especially ones not used for training/testing)

Is there any best-practice guide on how to improve this? The Augustus website is unfortunately quite poor on detail… My impression so far is that ramping up the number of training models isn’t really doing too much beyond a certain point (tried 400, 500 and 750).

Regards,

Marc


Marc P. Hoeppner, PhD
Team Leader
BILS Genome Annotation Platform
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at bils.se





More information about the maker-devel mailing list