[maker-devel] Help debugging a MAKER result

Xabier Vázquez-Campos xvazquezc at gmail.com
Mon Oct 1 18:00:43 MDT 2018


Hi Lior,

without getting in a lot of detail a good model covering the repeats in
your genome is extremely important, specially in genomes with a lot of
repeats. If the repeat library does not have an appropriate coverage,
anything based on the masked genome will be affected

The evidence you pass into Augustus to generate the gene model can have a
huge impact. Aside of the repeats, BUSCO-generated gene models can
under-predict
https://groups.google.com/forum/?hl=en-GB#!topic/maker-devel/ocnDG4nq1A8
And we have seen in our lab that the gene models generated by Augustus can
be very different if you provide an haploid assembly vs haploid + alternate
contigs vs diploid. In general, a purely haploid assembly generates a less
biased model as it has lower number of duplicated conserved genes present,
that will unbalance the gene model towards them. (at least in BUSCO-based
models, but it should be extensible to any Augustus model)

Note that in the end the generated annotation is just a model/hypothesis
and may require more than a bit of curation... usually increasing with more
complex genomes.

Cheers,
Xabi

On Tue, 2 Oct 2018 at 05:23, Lior Glick <liorglic at mail.tau.ac.il> wrote:

> Hi MAKER users,
> I am new to Maker and had just finished running my first annotations.
> Although the results make sense in general, I have reasons to suspect some
> gene models are wrong and would like your help in understanding and
> optimizing the results.
> My research project involves the annotation of multiple tomato varieties
> (individuals) which are a bit different from the published reference
> genome. To this end, I created de-novo assemblies of these genomes and also
> generated an evidence set to be used as input for Maker. Evidence consist
> of a large set of transcripts from various tomato varieties and conditions,
> as well as full protein sets from 6 plant species, including the proteins
> derived from the annotation of the reference - called ITAG.
> For an initial QA, I tried annotating the reference genome using my
> evidence data and Augustus as gene predictor. This should allow me to
> compare my result to the ITAG annotation, which I assume to be the
> "correct" answer, and see how well I'm doing. I should mention that ITAG
> annotation was also created using Maker, followed by manual curation.
> I started by comparing the protein sets from my result and the ITAT set.
> Specifically, I ran an all-vs-all blast and took the top hits. I discovered
> that only about 70% of the ITAG proteins are covered by a protein from my
> result with a high quality alignment (evalue > 10e-5, coverage > 90%). I
> further investigated by running BUSCO on both protein sets and looking at
> BUSCOs found in ITAG but missing in my result. Attached is a screenshot
> from a genome browser where you can see such a case. Top track is the ITAG
> gene model, below is my result. Third track is the protein evidence
> alignments (i.e blastx and protein2genome features), and bottom track are
> masked repeats.
> As you can see, there seems to be two issues with my result:
> 1. The two genes in ITAG were fused into one. I guess this is a difficult
> case as the genes are really close together.
> 2. The last (3') CDS of the ITAG gene was predicted to be the 3' UTR in my
> result. This is in fact the reason I ended up with a truncated protein and
> a missing BUSCO.
> This is a bit surprising to me, since there seems to be quite a lot of
> protein evidence supporting this region as a CDS. Can you help me figure
> out why is the result so? Could it be due to the small repeats detected in
> this region?
> Any ideas on how my result can be improved without manual curation?
>
> Many thanks!
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


-- 
Xabier Vázquez-Campos, *PhD*
*Research Associate*
NSW Systems Biology Initiative
School of Biotechnology and Biomolecular Sciences
The University of New South Wales
Sydney NSW 2052 AUSTRALIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20181002/8c5b153f/attachment-0002.html>


More information about the maker-devel mailing list