[maker-devel] Augustus retraining
Panos Ioannidis
panos.ioannidis at gmail.com
Tue Mar 24 08:31:04 MDT 2015
Hi Carson,
So you think it's okay to include incomplete gene models when training
Augustus?
I'll certainly try the bootstrap method you're suggesting. Even though I
did it for SNAP, for some weird reason I forgot it for Augustus :p Do you
think, however, that I can get a big improvement in gene-level sensitivity?
Currently, I have only 6%...
Thanks,
Panos
On Tue, Mar 24, 2015 at 3:14 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Hi Panos,
>
> EST’s and mRNA-seq assemblies will bey their nature be partial. After a
> first round of training you can run MAKER together with protein and EST
> evidence and the newly trained Augustus species file. Because MAKER gives
> hints to Augustus as it runs, the models it produces will be improved over
> what it would get from just running Augustus on it’s own. Then take these
> gene models and use them to retrain Augustus. This is the standard
> bootstrap retraining procedure, and can be repeated as needed.
>
> More info on bootstrap training here (info is for SNAP but procedure is
> similar to Augustus) —> http://weatherby.genetics.
> utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_
> Online_Training_2014#Training_ab_initio_Gene_Predictors
> Here is an excellent explanation of Augustus training —>
> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html
> and here are tools to convert SNAP training files to Augustus training
> files (MAKER comes with a tool that converts GFF3 for SNAP training so just
> take that and convert it for Augustus)—> https://github.com/
> hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl
>
> Finally you can also manually edit the GFF3 file in Apollo (easier to use
> the legacy stand alone version), and then convert that file for bootstrap
> training.
>
> —Carson
>
>
> On Mar 24, 2015, at 6:24 AM, Panos Ioannidis <panos.ioannidis at gmail.com>
> wrote:
>
> Hi Xabier,
>
> Thanks for your quick reply!
>
> No, I haven't used WebAugustus, but I just checked it out and it looks
> like my training set is too big (~300 Mbp), so I can't even upload it!
>
> Anyway, I prefer to train it locally because I have better control over
> each step. Also, I have done the entire training procedure with less genes,
> but didn't get a good gene-level sensitivity (~5%). So now I'm trying to
> replicate it using more of my scaffolds, but as it appears I get a lot more
> incomplete models from exonerate (run through Maker).
>
> P
>
>
>
> On Tue, Mar 24, 2015 at 1:06 PM, Xabier Vázquez Campos <
> xvazquezc at gmail.com> wrote:
>
>> Hi Panos,
>>
>> Have you tried using webAugustus for the (re)training? I found it very
>> convenient for generating the models for Augustus.
>>
>> Cheers,
>>
>> 2015-03-24 19:29 GMT+11:00 Panos Ioannidis <panos.ioannidis at gmail.com>:
>>
>>> Hello All,
>>>
>>> I'm trying to retrain Augustus using EST data from the same species and
>>> realized that quite a few of the gene models I get based on EST data are
>>> incomplete (i.e. no start and/or stop codon).
>>>
>>> Now, when I get to the "etraining" step in Augustus retraining (right
>>> after the time-consuming "optimize_augustus.pl" step), I get a warning
>>> for each gene that doesn't contain a start or stop codon.
>>>
>>> .....
>>> gene maker-scaffold4|size2210279-exonerate_est2genome-gene-20.1-mRNA-1
>>> transcr. 1 in sequence scaffold4|size2210279_2021791-2044735: Initial exon
>>> does not begin with start codon but with acg
>>> gene maker-scaffold4|size2210279-exonerate_est2genome-gene-20.2-mRNA-1
>>> transcr. 1 in sequence scaffold4|size2210279_2045713-2064983: Terminal exon
>>> doesn't end in stop codon. Variable stopCodonExcludedFromCDS set right?
>>> ....
>>>
>>> Does anyone know whether training is compromised by such incomplete gene
>>> models? Do you usually exclude them from the training set?
>>>
>>> Oh, and by the way, the best guide to retraining Augustus is here
>>> <http://avrilomics.blogspot.ch/2013/04/training-augustus-gene-finding-software.html>.
>>> The official
>>> <http://bioinf.uni-greifswald.de/augustus/binaries/retraining.html> web
>>> page isn't bad, but doesn't explain in detail certain things.
>>>
>>> Thanks,
>>> Panos
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>
>>
>>
>> --
>> Xabier Vázquez Campos
>> *PhD Candidate*
>> Water Research Centre
>> School of Civil and Environmental Engineering
>> The University of New South Wales
>> Sydney NSW 2052 AUSTRALIA
>>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150324/34c2980c/attachment-0003.html>
More information about the maker-devel
mailing list