[maker-devel] augustus underpredicting

Xabier Vázquez-Campos xvazquezc at gmail.com
Tue Sep 19 18:02:04 MDT 2017


Thanks Carson.

Last quick question. After the first run (before using the gene predictors)
I ran fasta_merge to get an idea of the numbers I should be looking for.
In summary, I got 14000 genes, only using Swissprot and a close highly
curated reference genome to avoid any "fake" protein or partial proteins
from draft annotations, plus assembled RNA-seq from my genome.
How should I consider this as a guide? (if I can do so) ... Is this a
number I should be aiming as a minimum number of genes? maximum? something
around that?

PS my genome (fungus) is 80+ Mbp and just 150 contigs so I expect very few
possible fragments due assembly (seq errors aside)

On 20 September 2017 at 07:34, Carson Holt <carsonhh at gmail.com> wrote:

> Gene predictors tend to over predict, so I would not take the high numbers
> given by SNAP and GeneMark as true counts. You will probably end up with
> something like 7-10k in the final results. But now Augustus is giving a
> higher count, you should be good to start running MAKER.
>
> —Carson
>
>
>
>
> On Sep 17, 2017, at 7:12 PM, Xabier Vázquez-Campos <xvazquezc at gmail.com>
> wrote:
>
> I did it that way and AUGUSTUS is predicting a more reasonable number of
> genes, about 12500 in Maker, but about 19000 in the model assessment step.
> In comparison, SNAP gives 16000 and GeneMark 19000.
>
> I haven't found any reference about but, would it be a good idea to train
> Augustus over the masked genome instead?
> Thanks,
>
>
>
> On 12 September 2017 at 02:50, Carson Holt <carsonhh at gmail.com> wrote:
>
>> BUSCO may be generating too few models. BUSCO also identifies classes of
>> conserved short genes that may not represent enough training diversity for
>> your organism. Try running MAKER in protein2genome or est2genome mode, and
>> then train with those results.
>>
>> —Carson
>>
>>
>> On Sep 10, 2017, at 7:03 PM, Xabier Vázquez-Campos <xvazquezc at gmail.com>
>> wrote:
>>
>> Hi,
>> I have been annotating a fungal genome as usual, using Busco-trained
>> Augustus (in addition to GeneMark and SNAP), but for some reason, Augustus
>> is predicting a mere 207 genes compared to 15-20k from the other two.
>> I've never had this problem. The genome has an unusual repeat content
>> close to 50%, not sure if that might suppose a problem.
>> Has anybody come up with any similar issue?
>> I also asked to Busco developers if they have any idea
>> https://gitlab.com/ezlab/busco/issues/49
>> Cheers,
>> Xabi
>>
>> --
>> Xabier Vázquez-Campos, *PhD*
>> *Research Associate*
>> NSW Systems Biology Initiative
>> School of Biotechnology and Biomolecular Sciences
>> The University of New South Wales
>> Sydney NSW 2052 AUSTRALIA
>>
>>
>>
>
>
> --
> Xabier Vázquez-Campos, *PhD*
> *Research Associate*
> NSW Systems Biology Initiative
> School of Biotechnology and Biomolecular Sciences
> The University of New South Wales
> Sydney NSW 2052 AUSTRALIA
>
>
>


-- 
Xabier Vázquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170920/ca7c08db/attachment-0003.html>


More information about the maker-devel mailing list