[maker-devel] Can maker select a gene model based on #algoritham predicted it
Gowthaman Ramasamy
gowthaman.ramasamy at seattlebiomed.org
Fri Jun 1 13:07:18 MDT 2012
That sounds really good.
Just wondering what would that float point mean?
fraction of gene prediction algorithms predicted that region to contain a gene (irrespective of boundaries matching) so 0.2 means 20% of algorithms predicted it??
Or
it just indicates lever of concordance (in maker language) and user has to try different values before settling on one?
Thanks,
gowthaman
________________________________________
From: Carson Holt [carsonhh at gmail.com]
Sent: Friday, June 01, 2012 11:52 AM
To: Barry Moore
Cc: Gowthaman Ramasamy; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] Can maker select a gene model based on #algoritham predicted it
One idea related to this. I could have keep_preds be a floating point value between 0 and 1. This would then represent a threshold for an internal MAKER value called the ab-initio AED (it already exists internally deep in MAKER). 0 would turn keep_preds off (as it does now), 1 would keep everything (as it does now), and values in between would allow the user to dial in the degree of consensus among overlapping predictions when considering them without evidence. The ab-initio AED already works similar to AED, with 0 being perfect concordance and 1 being complete discordance.
--Carson
From: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Date: Friday, 1 June, 2012 2:41 PM
To: Barry Moore <barry.moore at genetics.utah.edu<mailto:barry.moore at genetics.utah.edu>>
Cc: Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>>, "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Can maker select a gene model based on #algoritham predicted it
While I could add an option to keep them if there are more than one, the actual implementation is not as trivial as it seems. On some organisms like fungi and oomycetes, the predictions that don't overlap evidence tend to be similar to each other across predictors, but on other eukaryotes with difficult and complex intron/exon structure like lamprey or even planaria about the only time two predictors will produce similar results coorelated with when there is evidence supporting them, and all the unsupported regions are messy with weird partial overlaps (sometimes even conflicting reading frames). I have a figure in the MAKER2 paper showing how poorly these algorithms perform on such organisms and how additional evidence based feedback provided by MAKER produces dramatically improved results.
The way I get around the issues when choosing the non-redundant non-overlapping proteins recorded at the end of a MAKER run uses a complex variant of the AED calculation across the alternate predictions to build a consensus. So in short it's not exactly as simple as just saying there are two predictions at a given locus. It would require some thought (as well as good documentation), but it could probably be done.
--Carson
From: Barry Moore <barry.moore at genetics.utah.edu<mailto:barry.moore at genetics.utah.edu>>
Date: Friday, 1 June, 2012 2:22 PM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: Gowthaman Ramasamy <gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>>, "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Subject: Re: [maker-devel] Can maker select a gene model based on #algoritham predicted it
Carson,
How hard would it be to have maker take an option something like 'require_abinits=2' that would instruct maker to promote predictions that overlap with (2, 3 or more) other predictions? Seems like the maker might have all that info in one place at some point already?
Gowthaman, your contributions to the maker tutorial would be most welcome. I've got an offline copy of a newer tutorial wiki that is more up to date than the GMOD version. It's on a server right now that we've got locked behind a firewall, but I'm hoping to move that to a public facing server in the next week and I'd be happy to give you an account on the wiki.
B
On May 30, 2012, at 6:54 AM, Carson Holt wrote:
It's not an option in exactly the way you are specifying, but there is
something I usually do for annotation that works well. I run interproscan
or rpsblast on the non_overlapping.proteins.fasta file and select just
those non-overlapping models that have a recognizable protein domain (just
searching the pfam doamin space is more than sufficient). Then I provide
the selected results to model_gff, and provide the previous maker results
to the maker_gff option with (all reannotation pass options set to 1 and
all analysis options turned off). This adds models with at least
recognizable domains (as even multiple gene predictors can overpredict in
a similar way).
Attached is a script to help select predictions and upgrade them to models
in GFF3 format. If you have question let me know.
Thanks,
Carson
On 12-05-29 5:54 PM, "Gowthaman Ramasamy"
<gowthaman.ramasamy at seattlebiomed.org<mailto:gowthaman.ramasamy at seattlebiomed.org>> wrote:
Hi Carson,
Thanks for all the help during the long weekend, in spite of that long
drive. I am still trying to imagine that.
I now have maker to consider our own prediction via pred_gff, and use
augustus and gene mark (with our training model). And i was able to use
altest and protein evidences. Maker happily picks one gene model when
there is a overlap between three different predictions. But, when I look
at the gff, it seems like it picks a gene model only when there is an
est/protein evidence. It leaves out some genes even though, they are
predicted by all three algorithms. Of course, keep_pred=1 helps to keep
all the models. This kind of leads to over prediction.
But, I am looking for something in between. And would like to know if
that is possible?
1) Pick a gene model if it has an evidence from (est/prot etc...)
irrespective of how many algorithms predicted it
2) In the absence of extrinsic evidence (est/prot etc), pick a gene model
if that is predicted by at least two algorithms.
Or even simpler:
I have ab-initio predictions from three algorithms, Can I output, those
genes that is supported by at least two of them. I care less about
exactness of gene boundaries.
Thanks,
Gowthaman
PS: With my recent attempts, i learned couple things about maker/other
associated tools that is not documented in gmod-maker wiki. Is it
possible/ok if I add contents to it. I am okay with running it by you
before making it public.
<gff3_preds2models>_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
More information about the maker-devel
mailing list