<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I like this a lot Carson - for two reasons: First, it sounds like it's fairly easy to implement with that data and code that already exists within MAKER! And second it sounds like the right way to be doing this - the more the abintis agree the more likely to they are to be correct.<div><br></div><div>B</div><div><br><div><div>On Jun 1, 2012, at 1:23 PM, Carson Holt wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>The metric AED is Annotation Edit Distance (original paper --><br><a href="http://www.biomedcentral.com/1471-2105/10/67">http://www.biomedcentral.com/1471-2105/10/67</a>). It's roughly related to<br>the sensitivity/specificity measure used to quantify the performance of<br>gene predictors and can be used to measure changes in gene models across<br>releases, and I further adapted use it for some slightly different purpose<br>than given in the original paper above.<br><br>This is copied from the MAKER2 paper --><br>"Given a gene prediction i and a reference j, the base pair level<br>sensitivity can be calculated using the formula SN = |i$B"A(Bj|/|j|; where<br>|i$B"A(Bj| represents the number overlapping nucleotides between i and j, and<br>|j| represents the total number of nucleotides in the reference j.<br>Alternatively, specificity is calculated using the formula SP = |i$B"A(Bj|/|i|,<br>and accuracy is the average of the two. Because we are not comparing to a<br>high quality reference (reference is arbitrary for AED), it is more<br>correct to refer to the average of sensitivity and specificity as the<br>congruency rather than accuracy; where C = (SN+SP)/2. The incongruency, or<br>distance between i and j, then becomes D = 1-C, with a value of 0<br>indicating complete agreement of an annotation to the evidence, and values<br>at or near 1 indicating disagreement or no evidence support."<br><br><br><br>The ab-initio AED in comparison is the pairwise AED calculated between<br>each overlapping prediction and then averaged. Each pair then have a<br>score representing it's average distance from the overlapping set of<br>predictions as a whole. So a value of .1 would be 10% average<br>incongruency or 90% average congruency.<br><br>Thanks,<br>Carson<br><br><br><br>On 12-06-01 3:07 PM, "Gowthaman Ramasamy"<br><<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a>> wrote:<br><br><blockquote type="cite">That sounds really good.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Just wondering what would that float point mean?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">fraction of gene prediction algorithms predicted that region to contain a<br></blockquote><blockquote type="cite">gene (irrespective of boundaries matching) so 0.2 means 20% of algorithms<br></blockquote><blockquote type="cite">predicted it?? <br></blockquote><blockquote type="cite">Or <br></blockquote><blockquote type="cite">it just indicates lever of concordance (in maker language) and user has<br></blockquote><blockquote type="cite">to try different values before settling on one?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks,<br></blockquote><blockquote type="cite">gowthaman<br></blockquote><blockquote type="cite">________________________________________<br></blockquote><blockquote type="cite">From: Carson Holt [<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>]<br></blockquote><blockquote type="cite">Sent: Friday, June 01, 2012 11:52 AM<br></blockquote><blockquote type="cite">To: Barry Moore<br></blockquote><blockquote type="cite">Cc: Gowthaman Ramasamy; <a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><br></blockquote><blockquote type="cite">Subject: Re: [maker-devel] Can maker select a gene model based on<br></blockquote><blockquote type="cite">#algoritham predicted it<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">One idea related to this. I could have keep_preds be a floating point<br></blockquote><blockquote type="cite">value between 0 and 1. This would then represent a threshold for an<br></blockquote><blockquote type="cite">internal MAKER value called the ab-initio AED (it already exists<br></blockquote><blockquote type="cite">internally deep in MAKER). 0 would turn keep_preds off (as it does now),<br></blockquote><blockquote type="cite">1 would keep everything (as it does now), and values in between would<br></blockquote><blockquote type="cite">allow the user to dial in the degree of consensus among overlapping<br></blockquote><blockquote type="cite">predictions when considering them without evidence. The ab-initio AED<br></blockquote><blockquote type="cite">already works similar to AED, with 0 being perfect concordance and 1<br></blockquote><blockquote type="cite">being complete discordance.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">--Carson<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">From: Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a><<a href="mailto:carsonhh@gmail.com">mailto:carsonhh@gmail.com</a>>><br></blockquote><blockquote type="cite">Date: Friday, 1 June, 2012 2:41 PM<br></blockquote><blockquote type="cite">To: Barry Moore <br></blockquote><blockquote type="cite"><<a href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a><<a href="mailto:barry.moore@genetics.utah.edu">mailto:barry.moore@genetics.utah.edu</a>>><br></blockquote><blockquote type="cite">Cc: Gowthaman Ramasamy<br></blockquote><blockquote type="cite"><<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a><mailto:gowthaman.ramasamy@seattlebio<br></blockquote><blockquote type="cite"><a href="http://med.org">med.org</a>>>, <br></blockquote><blockquote type="cite">"<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><<a href="mailto:maker-devel@yandell-lab.org">mailto:maker-devel@yandell-lab.org</a>>"<br></blockquote><blockquote type="cite"><<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><<a href="mailto:maker-devel@yandell-lab.org">mailto:maker-devel@yandell-lab.org</a>>><br></blockquote><blockquote type="cite">Subject: Re: [maker-devel] Can maker select a gene model based on<br></blockquote><blockquote type="cite">#algoritham predicted it<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">While I could add an option to keep them if there are more than one, the<br></blockquote><blockquote type="cite">actual implementation is not as trivial as it seems. On some organisms<br></blockquote><blockquote type="cite">like fungi and oomycetes, the predictions that don't overlap evidence<br></blockquote><blockquote type="cite">tend to be similar to each other across predictors, but on other<br></blockquote><blockquote type="cite">eukaryotes with difficult and complex intron/exon structure like lamprey<br></blockquote><blockquote type="cite">or even planaria about the only time two predictors will produce similar<br></blockquote><blockquote type="cite">results coorelated with when there is evidence supporting them, and all<br></blockquote><blockquote type="cite">the unsupported regions are messy with weird partial overlaps (sometimes<br></blockquote><blockquote type="cite">even conflicting reading frames). I have a figure in the MAKER2 paper<br></blockquote><blockquote type="cite">showing how poorly these algorithms perform on such organisms and how<br></blockquote><blockquote type="cite">additional evidence based feedback provided by MAKER produces<br></blockquote><blockquote type="cite">dramatically improved results.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The way I get around the issues when choosing the non-redundant<br></blockquote><blockquote type="cite">non-overlapping proteins recorded at the end of a MAKER run uses a<br></blockquote><blockquote type="cite">complex variant of the AED calculation across the alternate predictions<br></blockquote><blockquote type="cite">to build a consensus. So in short it's not exactly as simple as just<br></blockquote><blockquote type="cite">saying there are two predictions at a given locus. It would require some<br></blockquote><blockquote type="cite">thought (as well as good documentation), but it could probably be done.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">--Carson<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">From: Barry Moore <br></blockquote><blockquote type="cite"><<a href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a><<a href="mailto:barry.moore@genetics.utah.edu">mailto:barry.moore@genetics.utah.edu</a>>><br></blockquote><blockquote type="cite">Date: Friday, 1 June, 2012 2:22 PM<br></blockquote><blockquote type="cite">To: Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a><<a href="mailto:carsonhh@gmail.com">mailto:carsonhh@gmail.com</a>>><br></blockquote><blockquote type="cite">Cc: Gowthaman Ramasamy<br></blockquote><blockquote type="cite"><<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a><mailto:gowthaman.ramasamy@seattlebio<br></blockquote><blockquote type="cite"><a href="http://med.org">med.org</a>>>, <br></blockquote><blockquote type="cite">"<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><<a href="mailto:maker-devel@yandell-lab.org">mailto:maker-devel@yandell-lab.org</a>>"<br></blockquote><blockquote type="cite"><<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a><<a href="mailto:maker-devel@yandell-lab.org">mailto:maker-devel@yandell-lab.org</a>>><br></blockquote><blockquote type="cite">Subject: Re: [maker-devel] Can maker select a gene model based on<br></blockquote><blockquote type="cite">#algoritham predicted it<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Carson,<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">How hard would it be to have maker take an option something like<br></blockquote><blockquote type="cite">'require_abinits=2' that would instruct maker to promote predictions that<br></blockquote><blockquote type="cite">overlap with (2, 3 or more) other predictions? Seems like the maker<br></blockquote><blockquote type="cite">might have all that info in one place at some point already?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Gowthaman, your contributions to the maker tutorial would be most<br></blockquote><blockquote type="cite">welcome. I've got an offline copy of a newer tutorial wiki that is more<br></blockquote><blockquote type="cite">up to date than the GMOD version. It's on a server right now that we've<br></blockquote><blockquote type="cite">got locked behind a firewall, but I'm hoping to move that to a public<br></blockquote><blockquote type="cite">facing server in the next week and I'd be happy to give you an account on<br></blockquote><blockquote type="cite">the wiki.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">B<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">On May 30, 2012, at 6:54 AM, Carson Holt wrote:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">It's not an option in exactly the way you are specifying, but there is<br></blockquote><blockquote type="cite">something I usually do for annotation that works well. I run interproscan<br></blockquote><blockquote type="cite">or rpsblast on the non_overlapping.proteins.fasta file and select just<br></blockquote><blockquote type="cite">those non-overlapping models that have a recognizable protein domain (just<br></blockquote><blockquote type="cite">searching the pfam doamin space is more than sufficient). Then I provide<br></blockquote><blockquote type="cite">the selected results to model_gff, and provide the previous maker results<br></blockquote><blockquote type="cite">to the maker_gff option with (all reannotation pass options set to 1 and<br></blockquote><blockquote type="cite">all analysis options turned off). This adds models with at least<br></blockquote><blockquote type="cite">recognizable domains (as even multiple gene predictors can overpredict in<br></blockquote><blockquote type="cite">a similar way).<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Attached is a script to help select predictions and upgrade them to models<br></blockquote><blockquote type="cite">in GFF3 format. If you have question let me know.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks,<br></blockquote><blockquote type="cite">Carson<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">On 12-05-29 5:54 PM, "Gowthaman Ramasamy"<br></blockquote><blockquote type="cite"><<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a><mailto:gowthaman.ramasamy@seattlebio<br></blockquote><blockquote type="cite"><a href="http://med.org">med.org</a>>> wrote:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Hi Carson,<br></blockquote><blockquote type="cite">Thanks for all the help during the long weekend, in spite of that long<br></blockquote><blockquote type="cite">drive. I am still trying to imagine that.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I now have maker to consider our own prediction via pred_gff, and use<br></blockquote><blockquote type="cite">augustus and gene mark (with our training model). And i was able to use<br></blockquote><blockquote type="cite">altest and protein evidences. Maker happily picks one gene model when<br></blockquote><blockquote type="cite">there is a overlap between three different predictions. But, when I look<br></blockquote><blockquote type="cite">at the gff, it seems like it picks a gene model only when there is an<br></blockquote><blockquote type="cite">est/protein evidence. It leaves out some genes even though, they are<br></blockquote><blockquote type="cite">predicted by all three algorithms. Of course, keep_pred=1 helps to keep<br></blockquote><blockquote type="cite">all the models. This kind of leads to over prediction.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">But, I am looking for something in between. And would like to know if<br></blockquote><blockquote type="cite">that is possible?<br></blockquote><blockquote type="cite">1) Pick a gene model if it has an evidence from (est/prot etc...)<br></blockquote><blockquote type="cite">irrespective of how many algorithms predicted it<br></blockquote><blockquote type="cite">2) In the absence of extrinsic evidence (est/prot etc), pick a gene model<br></blockquote><blockquote type="cite">if that is predicted by at least two algorithms.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Or even simpler:<br></blockquote><blockquote type="cite">I have ab-initio predictions from three algorithms, Can I output, those<br></blockquote><blockquote type="cite">genes that is supported by at least two of them. I care less about<br></blockquote><blockquote type="cite">exactness of gene boundaries.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks,<br></blockquote><blockquote type="cite">Gowthaman<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">PS: With my recent attempts, i learned couple things about maker/other<br></blockquote><blockquote type="cite">associated tools that is not documented in gmod-maker wiki. Is it<br></blockquote><blockquote type="cite">possible/ok if I add contents to it. I am okay with running it by you<br></blockquote><blockquote type="cite">before making it public.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><gff3_preds2models>_______________________________________________<br></blockquote><blockquote type="cite">maker-devel mailing list<br></blockquote><blockquote type="cite"><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><<a href="mailto:maker-devel@box290.bluehost.com">mailto:maker-devel@box290.bluehost.com</a>><br></blockquote><blockquote type="cite"><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Barry Moore<br></blockquote><blockquote type="cite">Research Scientist<br></blockquote><blockquote type="cite">Dept. of Human Genetics<br></blockquote><blockquote type="cite">University of Utah<br></blockquote><blockquote type="cite">Salt Lake City, UT 84112<br></blockquote><blockquote type="cite">--------------------------------------------<br></blockquote><blockquote type="cite">(801) 585-3543<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><br><br></div></blockquote></div><br><div>
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div><span class="Apple-style-span" style="font-family: Arial; font-size: 12px; "><div>Barry Moore</div><div>Research Scientist</div><div>Dept. of Human Genetics</div><div>University of Utah</div><div>Salt Lake City, UT 84112</div><div>--------------------------------------------</div><div>(801) 585-3543</div><div><br class="khtml-block-placeholder"></div></span></div><div><br></div></span><br class="Apple-interchange-newline">
</div>
<br></div></body></html>