[maker-devel] Maker not predicting many genes

Carson Holt carsonhh at gmail.com
Mon Feb 17 12:26:05 MST 2014


>From your control file, it looks like not setting single_exon=1, and only
using UniProt rather than supplying complete proteomes of a related species
are your primary shortcomings.  I’d set correct_est_fusion=1 as well.

—Carson


From:  Carson Holt <carson.holt at genetics.utah.edu>
Date:  Monday, February 17, 2014 at 12:22 PM
To:  "Valero Jimenez, Claudio" <claudio.valero at wur.nl>,
"'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Maker not predicting many genes

You also need to look at the contigs in a browser like apollo.  That will
allow you to see both the predictions and the evidence in context.  You can
then see if genes are being dropped because they are only being supported by
single exon evidence, they have no evidence support whatsoever, or if they
are being excluded because of UTR overlap.  That last one is a common
problem for fungi when using assembled mRNA-seq reads.  Fungi genes are so
close that they often overlap in the UTR.  As a result, mRNA-seq assemblers
falsely asseble neighboring genes into single transcripts.  The result is
really long UTR on some of your gene models that force other models to be
excluded.  If this is the case, rerun something like trinity with the
jacquard clip option set  to avoid transcript fusion.  Then set
correct_est_fusion=1 in the MAKER control files to get those long false
UTR’s clipped off.

If it is a lack of evidence overlap, make sure you provided minimum 1
proteome from a related species to the protein= option.  At least 2
proteomes are recommended though (these are not proteins from the same
species but rather complete proteomes from related species).  Also
comprehensive databases like UniProt/Swiss-prot are not sufficient on their
own, but can supplement the other proteome data.  Also are you providing EST
data?  Note that EST/mRNA-seq data without a proteome from a related species
is also not siufficient (because both quality and how comprehensive
EST/mRNA-seq databsases are can vary so widely, and may only capture as
little as 30% of the genes).

Another thing that comes into play are single exon evidence.  In anything
but fungi, single exon evidence is mostly caused by spurious alignments.
But fungi have so many single exon genes, that this is not the case for
them.  Make sure single_exon=1 is set to allow that evidence to be kept, and
set the length of single exon evidence to keep to something like 250 bp.

Thanks,
Carson






From: "Valero Jimenez, Claudio" <claudio.valero at wur.nl>
Date: Monday, February 17, 2014 at 2:23 AM
To: "'maker-devel at yandell-lab.org'" <maker-devel at yandell-lab.org>
Subject: Maker not predicting many genes

Dear list,
 
I’m trying to annotate a fungal genome, and I’m surprised that Maker does
not predict many genes (3697). I have trained SNAP and followed all the
tutorials available. Ab initio predictors are able to predict between
8000-10000 genes. It is something that I have in the configuration file that
is wrong?? I attach the ops file and the SOBA summary of the annotation.
 
Regards,
 
Claudio
 
 
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140217/6c29cf24/attachment-0003.html>


More information about the maker-devel mailing list