<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>One idea related to this.  I could have keep_preds be a floating point value between 0 and 1.  This would then represent a threshold for an internal MAKER value called the ab-initio AED (it already exists internally deep in MAKER).  0 would turn keep_preds off (as it does now), 1 would keep everything (as it does now), and values in between would allow the user to dial in the degree of consensus among overlapping predictions when considering them without evidence.  The ab-initio AED already works similar to AED, with 0 being perfect concordance and 1 being complete discordance.</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Friday, 1 June, 2012 2:41 PM<br><span style="font-weight:bold">To: </span> Barry Moore <<a href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a>><br><span style="font-weight:bold">Cc: </span> Gowthaman Ramasamy <<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a>>, "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Can maker select a gene model based on #algoritham predicted it<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>While I could add an option to keep them if there are more than one, the actual implementation is not as trivial as it seems.  On some organisms like fungi and oomycetes, the predictions that don't overlap evidence tend to be similar to each other across predictors, but on other eukaryotes with difficult and complex intron/exon structure like lamprey or even planaria about the only time two predictors will produce similar results coorelated with when there is evidence supporting them, and all the unsupported regions are messy with weird partial overlaps (sometimes even conflicting reading frames).  I have a figure in the MAKER2 paper showing how poorly these algorithms perform on such organisms and how additional evidence based feedback provided by MAKER produces dramatically improved results.</div><div><br></div><div>The way I get around the issues when choosing the non-redundant non-overlapping proteins recorded at the end of a MAKER run uses a complex variant of the AED calculation across the alternate predictions to build a consensus.  So in short it's not exactly as simple as just saying there are two predictions at a given locus.  It would require some thought (as well as good documentation), but it could probably be done.</div><div><br></div><div>--Carson</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Barry Moore <<a href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a>><br><span style="font-weight:bold">Date: </span> Friday, 1 June, 2012 2:22 PM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> Gowthaman Ramasamy <<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a>>, "<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Can maker select a gene model based on #algoritham predicted it<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Carson,</div><div><br></div><div>How hard would it be to have maker take an option something like 'require_abinits=2' that would instruct maker to promote predictions that overlap with (2, 3 or more) other predictions?  Seems like the maker might have all that info in one place at some point already?</div><div><br></div><div><div><div>Gowthaman, your contributions to the maker tutorial would be most welcome.  I've got an offline copy of a newer tutorial wiki that is more up to date than the GMOD version.  It's on a server right now that we've got locked behind a firewall, but I'm hoping to move that to a public facing server in the next week and I'd be happy to give you an account on the wiki.</div></div></div><div><br></div><div>B</div><br><div><div>On May 30, 2012, at 6:54 AM, Carson Holt wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>It's not an option in exactly the way you are specifying, but there is<br>something I usually do for annotation that works well.  I run interproscan<br>or rpsblast on the non_overlapping.proteins.fasta file and select just<br>those non-overlapping models that have a recognizable protein domain (just<br>searching the pfam doamin space is more than sufficient).  Then I provide<br>the selected results to model_gff, and provide the previous maker results<br>to the maker_gff option with (all reannotation pass options set to 1 and<br>all analysis options turned off).  This adds models with at least<br>recognizable domains (as even multiple gene predictors can overpredict in<br>a similar way).<br><br>Attached is a script to help select predictions and upgrade them to models<br>in GFF3 format.  If you have question let me know.<br><br>Thanks,<br>Carson<br><br><br><br>On 12-05-29 5:54 PM, "Gowthaman Ramasamy"<br><<a href="mailto:gowthaman.ramasamy@seattlebiomed.org">gowthaman.ramasamy@seattlebiomed.org</a>> wrote:<br><br><blockquote type="cite">Hi Carson,<br></blockquote><blockquote type="cite">Thanks for all the help during the long weekend, in spite of that long<br></blockquote><blockquote type="cite">drive. I am still trying to imagine that.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I now have maker to consider our own prediction via pred_gff, and use<br></blockquote><blockquote type="cite">augustus and gene mark (with our training model). And i was able to use<br></blockquote><blockquote type="cite">altest and protein evidences. Maker happily picks one gene model when<br></blockquote><blockquote type="cite">there is a overlap between three different predictions. But, when I look<br></blockquote><blockquote type="cite">at the gff, it seems like it picks a gene model only when there is an<br></blockquote><blockquote type="cite">est/protein evidence. It leaves out some genes even though, they are<br></blockquote><blockquote type="cite">predicted by all three algorithms. Of course, keep_pred=1 helps to keep<br></blockquote><blockquote type="cite">all the models. This kind of leads to over prediction.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">But, I am looking for something in between. And would like to know if<br></blockquote><blockquote type="cite">that is possible?<br></blockquote><blockquote type="cite">1) Pick a gene model if it has an evidence from (est/prot etc...)<br></blockquote><blockquote type="cite">irrespective of how many algorithms predicted it<br></blockquote><blockquote type="cite">2) In the absence of extrinsic evidence (est/prot etc), pick a gene model<br></blockquote><blockquote type="cite">if that is predicted by at least two algorithms.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Or even simpler:<br></blockquote><blockquote type="cite">I have ab-initio predictions from three algorithms, Can I output, those<br></blockquote><blockquote type="cite">genes that is supported by at least two of them. I care less about<br></blockquote><blockquote type="cite">exactness of gene boundaries.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Thanks,<br></blockquote><blockquote type="cite">Gowthaman<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">PS: With my recent attempts, i learned couple things about maker/other<br></blockquote><blockquote type="cite">associated tools that is not documented in gmod-maker wiki. Is it<br></blockquote><blockquote type="cite">possible/ok if I add contents to it. I am okay with running it by you<br></blockquote><blockquote type="cite">before making it public.<br></blockquote><br><span><gff3_preds2models></span>_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></div></blockquote></div><br><div><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div><span class="Apple-style-span" style="font-family: Arial; font-size: 12px; "><div>Barry Moore</div><div>Research Scientist</div><div>Dept. of Human Genetics</div><div>University of Utah</div><div>Salt Lake City, UT 84112</div><div>--------------------------------------------</div><div>(801) 585-3543</div><div><br class="khtml-block-placeholder"></div></span></div><div><br></div></span><br class="Apple-interchange-newline"></div><br></div></div></span></div></div></span></body></html>