[maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file

Carson Holt carsonhh at gmail.com
Tue Nov 29 10:34:31 MST 2016


How to train Augustus —> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html <http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html>

Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while.

Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file.

After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance.

—Carson




> On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. <parulk at caltech.edu> wrote:
> 
> 
>> Dear Maker developers,
>> 
>> 1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 
>> 
>> 2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 
>> 
>> 3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command
>> 
>> gmes_petap.pl --sequence pmin_jelly.fa
>> 
>> 4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) 
>> 
>> I have couple of questions relating to Genemark and AUGUSTUS
>> 
>> 1. AUGUSTUS
>> 
>> We do not have a species file for species file of our interest or evolutionary closer species
>> 
>> following command is used to generate species file
>> 
>> 
>> /autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting 
>> AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model?
>> 
>> 2. Genemark
>> 
>> I used the gmhmm file generated in the genemark output directory, however I encounter following error
>> 
>> 
>> -------------------------
>> 
>> STATUS: Parsing control files...
>> ERROR: You have failed to provide a value for 'gmhmme3' in the control files.
>> ERROR: You have failed to provide a value for 'probuild' in the control files.
>> ---------------------
>> FYI
>> 
>> -----
>> 
>> maker_opts.ctl
>> 
>> 
>> #-----Gene Prediction
>> snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file
>> gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file
>> 
>> -----
>> 
>> Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. 
>> 
>> I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range.
>> 
>> 
>> Thanks and regards,
>> 
>> Parul
>> 
> 
> Sent from my iPhone

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20161129/b0d7025b/attachment-0003.html>


More information about the maker-devel mailing list