[maker-devel] Augustus retraining
Carson Holt
carsonhh at gmail.com
Tue Mar 24 08:14:51 MDT 2015
Hi Panos,
EST’s and mRNA-seq assemblies will bey their nature be partial. After a first round of training you can run MAKER together with protein and EST evidence and the newly trained Augustus species file. Because MAKER gives hints to Augustus as it runs, the models it produces will be improved over what it would get from just running Augustus on it’s own. Then take these gene models and use them to retrain Augustus. This is the standard bootstrap retraining procedure, and can be repeated as needed.
More info on bootstrap training here (info is for SNAP but procedure is similar to Augustus) —> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors <http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors>
Here is an excellent explanation of Augustus training —> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html <http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html>
and here are tools to convert SNAP training files to Augustus training files (MAKER comes with a tool that converts GFF3 for SNAP training so just take that and convert it for Augustus)—> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl <https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl>
Finally you can also manually edit the GFF3 file in Apollo (easier to use the legacy stand alone version), and then convert that file for bootstrap training.
—Carson
> On Mar 24, 2015, at 6:24 AM, Panos Ioannidis <panos.ioannidis at gmail.com> wrote:
>
> Hi Xabier,
>
> Thanks for your quick reply!
>
> No, I haven't used WebAugustus, but I just checked it out and it looks like my training set is too big (~300 Mbp), so I can't even upload it!
>
> Anyway, I prefer to train it locally because I have better control over each step. Also, I have done the entire training procedure with less genes, but didn't get a good gene-level sensitivity (~5%). So now I'm trying to replicate it using more of my scaffolds, but as it appears I get a lot more incomplete models from exonerate (run through Maker).
>
> P
>
>
>
> On Tue, Mar 24, 2015 at 1:06 PM, Xabier Vázquez Campos <xvazquezc at gmail.com <mailto:xvazquezc at gmail.com>> wrote:
> Hi Panos,
>
> Have you tried using webAugustus for the (re)training? I found it very convenient for generating the models for Augustus.
>
> Cheers,
>
> 2015-03-24 19:29 GMT+11:00 Panos Ioannidis <panos.ioannidis at gmail.com <mailto:panos.ioannidis at gmail.com>>:
> Hello All,
>
> I'm trying to retrain Augustus using EST data from the same species and realized that quite a few of the gene models I get based on EST data are incomplete (i.e. no start and/or stop codon).
>
> Now, when I get to the "etraining" step in Augustus retraining (right after the time-consuming "optimize_augustus.pl <http://optimize_augustus.pl/>" step), I get a warning for each gene that doesn't contain a start or stop codon.
>
> .....
> gene maker-scaffold4|size2210279-exonerate_est2genome-gene-20.1-mRNA-1 transcr. 1 in sequence scaffold4|size2210279_2021791-2044735: Initial exon does not begin with start codon but with acg
> gene maker-scaffold4|size2210279-exonerate_est2genome-gene-20.2-mRNA-1 transcr. 1 in sequence scaffold4|size2210279_2045713-2064983: Terminal exon doesn't end in stop codon. Variable stopCodonExcludedFromCDS set right?
> ....
>
> Does anyone know whether training is compromised by such incomplete gene models? Do you usually exclude them from the training set?
>
> Oh, and by the way, the best guide to retraining Augustus is here <http://avrilomics.blogspot.ch/2013/04/training-augustus-gene-finding-software.html>. The official <http://bioinf.uni-greifswald.de/augustus/binaries/retraining.html> web page isn't bad, but doesn't explain in detail certain things.
>
> Thanks,
> Panos
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>
>
>
>
> --
> Xabier Vázquez Campos
> PhD Candidate
> Water Research Centre
> School of Civil and Environmental Engineering
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150324/1e0e6b39/attachment-0003.html>
More information about the maker-devel
mailing list