[maker-devel] Augustus retraining
Panos Ioannidis
panos.ioannidis at gmail.com
Tue Mar 24 09:05:54 MDT 2015
Yes, 6% is gene-level sensitivity. Exon-level is 62% and nucleotide-level
is 88%. I only mentioned gene-level, because that's the only metric
mentioned in the Augustus web site.
I got these numbers outside of Maker. Actually, I only used Maker to
generate the gff files needed to start the training (ran it using only EST
evidence and only on a subset of my assembly, using this
<http://avrilomics.blogspot.ch/2013/04/training-augustus-gene-finding-software.html>
as a guide).
Now, I've started running the second round of training, as you suggested.
Since, however, I don't have data from closely related species, I'm only
using Uniref50 as protein evidence.
P
On Tue, Mar 24, 2015 at 3:39 PM, Carson Holt <carsonhh at gmail.com> wrote:
> On your first round it is fine. It gives the predictor enough to work
> with, then on the second round you use improved models. When you say 6%
> sensitivity is that Augustus running on it’s own? If it’s inside of MAKER
> that means you are not providing sufficient protein evidence (you need the
> full proteome of at least two related species). Also is that the gene
> level, exon level, or nucleotide level sensitivity. If you are looking at
> the gene level sensitivity measure, you only get a match when you perfectly
> match all transcripts in a gene (models that may not be correct in the
> first place). This value will rarely go above 10% for any predictor. You
> need to use the nucleotide level sensitivity/specificity metrics. The gene
> and exon level metrics are basically meaningless (unless it’s Drosophila
> which is the only species annotated correctly enough to use them).
>
> —Carson
>
>
> On Mar 24, 2015, at 8:31 AM, Panos Ioannidis <panos.ioannidis at gmail.com>
> wrote:
>
> Hi Carson,
>
> So you think it's okay to include incomplete gene models when training
> Augustus?
>
> I'll certainly try the bootstrap method you're suggesting. Even though I
> did it for SNAP, for some weird reason I forgot it for Augustus :p Do you
> think, however, that I can get a big improvement in gene-level sensitivity?
> Currently, I have only 6%...
>
> Thanks,
> Panos
>
>
> On Tue, Mar 24, 2015 at 3:14 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> Hi Panos,
>>
>> EST’s and mRNA-seq assemblies will bey their nature be partial. After a
>> first round of training you can run MAKER together with protein and EST
>> evidence and the newly trained Augustus species file. Because MAKER gives
>> hints to Augustus as it runs, the models it produces will be improved over
>> what it would get from just running Augustus on it’s own. Then take these
>> gene models and use them to retrain Augustus. This is the standard
>> bootstrap retraining procedure, and can be repeated as needed.
>>
>> More info on bootstrap training here (info is for SNAP but procedure is
>> similar to Augustus) —> http://weatherby.genetics.
>> utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_
>> Online_Training_2014#Training_ab_initio_Gene_Predictors
>> Here is an excellent explanation of Augustus training —>
>> http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html
>> and here are tools to convert SNAP training files to Augustus training
>> files (MAKER comes with a tool that converts GFF3 for SNAP training so just
>> take that and convert it for Augustus)—> https://github.com/
>> hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl
>>
>> Finally you can also manually edit the GFF3 file in Apollo (easier to use
>> the legacy stand alone version), and then convert that file for bootstrap
>> training.
>>
>> —Carson
>>
>>
>> On Mar 24, 2015, at 6:24 AM, Panos Ioannidis <panos.ioannidis at gmail.com>
>> wrote:
>>
>> Hi Xabier,
>>
>> Thanks for your quick reply!
>>
>> No, I haven't used WebAugustus, but I just checked it out and it looks
>> like my training set is too big (~300 Mbp), so I can't even upload it!
>>
>> Anyway, I prefer to train it locally because I have better control over
>> each step. Also, I have done the entire training procedure with less genes,
>> but didn't get a good gene-level sensitivity (~5%). So now I'm trying to
>> replicate it using more of my scaffolds, but as it appears I get a lot more
>> incomplete models from exonerate (run through Maker).
>>
>> P
>>
>>
>>
>> On Tue, Mar 24, 2015 at 1:06 PM, Xabier Vázquez Campos <
>> xvazquezc at gmail.com> wrote:
>>
>>> Hi Panos,
>>>
>>> Have you tried using webAugustus for the (re)training? I found it very
>>> convenient for generating the models for Augustus.
>>>
>>> Cheers,
>>>
>>> 2015-03-24 19:29 GMT+11:00 Panos Ioannidis <panos.ioannidis at gmail.com>:
>>>
>>>> Hello All,
>>>>
>>>> I'm trying to retrain Augustus using EST data from the same species and
>>>> realized that quite a few of the gene models I get based on EST data are
>>>> incomplete (i.e. no start and/or stop codon).
>>>>
>>>> Now, when I get to the "etraining" step in Augustus retraining (right
>>>> after the time-consuming "optimize_augustus.pl" step), I get a warning
>>>> for each gene that doesn't contain a start or stop codon.
>>>>
>>>> .....
>>>> gene maker-scaffold4|size2210279-exonerate_est2genome-gene-20.1-mRNA-1
>>>> transcr. 1 in sequence scaffold4|size2210279_2021791-2044735: Initial exon
>>>> does not begin with start codon but with acg
>>>> gene maker-scaffold4|size2210279-exonerate_est2genome-gene-20.2-mRNA-1
>>>> transcr. 1 in sequence scaffold4|size2210279_2045713-2064983: Terminal exon
>>>> doesn't end in stop codon. Variable stopCodonExcludedFromCDS set right?
>>>> ....
>>>>
>>>> Does anyone know whether training is compromised by such incomplete
>>>> gene models? Do you usually exclude them from the training set?
>>>>
>>>> Oh, and by the way, the best guide to retraining Augustus is here
>>>> <http://avrilomics.blogspot.ch/2013/04/training-augustus-gene-finding-software.html>.
>>>> The official
>>>> <http://bioinf.uni-greifswald.de/augustus/binaries/retraining.html>
>>>> web page isn't bad, but doesn't explain in detail certain things.
>>>>
>>>> Thanks,
>>>> Panos
>>>>
>>>>
>>>> _______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Xabier Vázquez Campos
>>> *PhD Candidate*
>>> Water Research Centre
>>> School of Civil and Environmental Engineering
>>> The University of New South Wales
>>> Sydney NSW 2052 AUSTRALIA
>>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150324/1567f72a/attachment-0003.html>
More information about the maker-devel
mailing list