[maker-devel] Can maker select a gene model based on #algoritham predicted it

Carson Holt carsonhh at gmail.com
Fri Jun 1 12:41:53 MDT 2012


While I could add an option to keep them if there are more than one, the
actual implementation is not as trivial as it seems.  On some organisms like
fungi and oomycetes, the predictions that don't overlap evidence tend to be
similar to each other across predictors, but on other eukaryotes with
difficult and complex intron/exon structure like lamprey or even planaria
about the only time two predictors will produce similar results coorelated
with when there is evidence supporting them, and all the unsupported regions
are messy with weird partial overlaps (sometimes even conflicting reading
frames).  I have a figure in the MAKER2 paper showing how poorly these
algorithms perform on such organisms and how additional evidence based
feedback provided by MAKER produces dramatically improved results.

The way I get around the issues when choosing the non-redundant
non-overlapping proteins recorded at the end of a MAKER run uses a complex
variant of the AED calculation across the alternate predictions to build a
consensus.  So in short it's not exactly as simple as just saying there are
two predictions at a given locus.  It would require some thought (as well as
good documentation), but it could probably be done.

--Carson

From:  Barry Moore <barry.moore at genetics.utah.edu>
Date:  Friday, 1 June, 2012 2:22 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org>,
"maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Can maker select a gene model based on
#algoritham predicted it

Carson,

How hard would it be to have maker take an option something like
'require_abinits=2' that would instruct maker to promote predictions that
overlap with (2, 3 or more) other predictions?  Seems like the maker might
have all that info in one place at some point already?

Gowthaman, your contributions to the maker tutorial would be most welcome.
I've got an offline copy of a newer tutorial wiki that is more up to date
than the GMOD version.  It's on a server right now that we've got locked
behind a firewall, but I'm hoping to move that to a public facing server in
the next week and I'd be happy to give you an account on the wiki.

B

On May 30, 2012, at 6:54 AM, Carson Holt wrote:

> It's not an option in exactly the way you are specifying, but there is
> something I usually do for annotation that works well.  I run interproscan
> or rpsblast on the non_overlapping.proteins.fasta file and select just
> those non-overlapping models that have a recognizable protein domain (just
> searching the pfam doamin space is more than sufficient).  Then I provide
> the selected results to model_gff, and provide the previous maker results
> to the maker_gff option with (all reannotation pass options set to 1 and
> all analysis options turned off).  This adds models with at least
> recognizable domains (as even multiple gene predictors can overpredict in
> a similar way).
> 
> Attached is a script to help select predictions and upgrade them to models
> in GFF3 format.  If you have question let me know.
> 
> Thanks,
> Carson
> 
> 
> 
> On 12-05-29 5:54 PM, "Gowthaman Ramasamy"
> <gowthaman.ramasamy at seattlebiomed.org> wrote:
> 
>> Hi Carson,
>> Thanks for all the help during the long weekend, in spite of that long
>> drive. I am still trying to imagine that.
>> 
>> I now have maker to consider our own prediction via pred_gff, and use
>> augustus and gene mark (with our training model). And i was able to use
>> altest and protein evidences. Maker happily picks one gene model when
>> there is a overlap between three different predictions. But, when I look
>> at the gff, it seems like it picks a gene model only when there is an
>> est/protein evidence. It leaves out some genes even though, they are
>> predicted by all three algorithms. Of course, keep_pred=1 helps to keep
>> all the models. This kind of leads to over prediction.
>> 
>> But, I am looking for something in between. And would like to know if
>> that is possible?
>> 1) Pick a gene model if it has an evidence from (est/prot etc...)
>> irrespective of how many algorithms predicted it
>> 2) In the absence of extrinsic evidence (est/prot etc), pick a gene model
>> if that is predicted by at least two algorithms.
>> 
>> Or even simpler:
>> I have ab-initio predictions from three algorithms, Can I output, those
>> genes that is supported by at least two of them. I care less about
>> exactness of gene boundaries.
>> 
>> Thanks,
>> Gowthaman
>> 
>> PS: With my recent attempts, i learned couple things about maker/other
>> associated tools that is not documented in gmod-maker wiki. Is it
>> possible/ok if I add contents to it. I am okay with running it by you
>> before making it public.
> 
> <gff3_preds2models>_______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120601/a3ab059e/attachment-0003.html>


More information about the maker-devel mailing list