<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>I concur with everything Barry said.  Also let me add that alt-ESTs do not get polished around splice sites (exonerate won't handle them).  However ESTs and proteins do.  This means that the utility of alt-ESTs in adding UTR, and splice information is zero.  They just function as an anchor to maintain gene models that might have otherwise been rejected.  This also means alt_est=some.fasta  together with est2genome=1 will produce virtually no additional results because est2genome requires that the splice site makes sense.  You are better off using proteins with protein2genome=1 if you don’t have ESTs and want partial models for training.  Once you have a trained ab initio gene predictor, turn the est2genome and protein2genome options off.  Otherwise they will give weird partial models that decrease the quality of your final annotations. (partial models are ok for training but that's about it).</div><div><br></div><div>If you are getting too low a gene count with keep_preds=0, then you probably need to add more evidence.  Try adding all proteins from a couple of related species (the protein= option accepts comma separated lists of files). If your genome is a fungi, oomycete, or a prokaryote, then setting keep_preds=1 is usually safe.  Theses are genomes with high gene density and simple gene structure, so ab initio predictors do really well and don't need as much help from the evidence.  For other organisms, leave it set to 0 or you will get a lot of false positives (false positives on some genomes with complex gene structure can outnumber the genes by 10 to 1 if you turn this on).</div><div><br></div><div>Thanks,</div><div>Carson</div><div><br></div><div><br></div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Barry Moore <<a href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a>><br><span style="font-weight:bold">Date: </span> Friday, 31 August, 2012 12:52 PM<br><span style="font-weight:bold">To: </span> Christoph Hahn <<a href="mailto:chrisi.hahni@gmail.com">chrisi.hahni@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] keep_preds option?<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Christopher,<div><br></div><div>Comments below:</div><div><br><div><div>On Aug 31, 2012, at 6:43 AM, Christoph Hahn wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Hello maker users and developers,<br><br>I am new to gene prediction and I am trying to use maker 2.25 on a newly assembled non-model organisms draft genome. Within maker I use genemark, SNAP and Augustus. I have a few questions:<br><br></div></blockquote><div><br></div><div>Welcome!</div><br><blockquote type="cite"><div>1. I was wondering what the keep_preds option means exactly.<br><br>I found two slightly different explanations on the option<br>#Add unsupported gene prediction to final annotation set (maker2.25)<br>#Add non-overlapping ab-inito gene prediction to final annotation set (found on the net - probably older maker version)<br><br></div></blockquote><div><br></div><div>It means to keep ab initio gene predictions for which there is no physical evidence.  There are two pieces of information that are required for every MAKER annotation (by default).  MAKER runs the ab initio gene predictors and aligns transcript (EST/cDNA/mRNASeq transcripts) and protein sequences to the genome.  For each locus where one or more gene predictions exist MAKER checks to see if there is any physical evidence for gene expression at that locus (RNA/protein sequence alignments) and if there is (splice EST or protein alignments) evidence overlapping the predictions, MAKER decides which prediction best matches the evidence and promotes that prediction to an annotation.  If there is no evidence overlapping any of the predictions then those predictions are not included in the output annotation file (although they are saved).  If you set keep_preds=1 then for each locus where prediction(s) exist maker keeps one and promotes it to an annotation even though there is no physical evidence.  The description of 'non-overlapping ab-initio'  means that MAKER has clustered all ab-initio predictions at one locus and chose one representative transcript to output.</div><br><blockquote type="cite"><div>As far as I understood keep_preds=0 only retains gene models for which the ab initio predictions agree. But how many, all three? two of three?<br>keep_preds=1 instead keeps all gene models regardless if the different programs agree, right?<br><br></div></blockquote><div><br></div><div>MAKER does not take the presence of multiple ab initio predictions as evidence and thus in the absence of aligned physical evidence MAKER will not output an annotation even if all three ab initio tools predict a gene at that locus.</div><br><blockquote type="cite"><div>In my case I get substantial differences in the number of gene models found between the two settings, while with =1 I get a number that is close to what we would expect. How would you interpret that? What would you recommend me to do? Obiously =0 is the saver option.<br></div></blockquote><div><br></div><div>If you think that the number of genes you are getting from a MAKER run is too few, you could run MAKER with keep_preds=1.  After the run is finished, use a tool like IPRScan to search all MAKER predictions for protein domain content and push that IPRScan output back into the MAKER GFF file with the ipr_update_gff script.  Then as a final step you can run over the GFF file and remove any gene model that doesn't have either physical evidence (AED < 1) or protein domain content (Dbxref=PFAM:XXX etc…) sorry there's not a script prepackaged with MAKER for that yet.</div><div><br></div><blockquote type="cite"><div><br>2. I tried to use EST data of an alternative organism in altest= (#EST/cDNA sequence file in fasta format from an alternate organism). The organism is quite distantly related, but its the closest I have so I thought I d give it a shot. I ran maker twice with identical settigs expect in altest and est2genome=0/1. The number of genes predicted is identical with both approaches, so I am not sure whether or not the EST data was actually used or its just to distantly related. Any easy way to assess this?<br></div></blockquote><div><br></div><div>Typically EST evidence from another organism with alt_est will add little in the way of additional support (compared to just using protein evidence from say Swiss-prot) and this would be especially true if your alt_est is distantly related.  I'm not sure I really understand you alt_est/est2genome combo's to comment in more detail.  I could see four possible combinations there: which two gave identical results?</div><br><blockquote type="cite"><div><br>3. I am running maker in several passes and after each pass I am training SNAP using the result of the previous pass. Then for every pass I run maker from scratch. Would you recommend to supply the gff of the previous pass in "#-----Re-annotation Using MAKER Derived GFF3<br>maker_gff= #re-annotate genome based on this gff3 file", instead?<br><br></div></blockquote><div><br></div><div>No, 'Re-annotation using MAKER Derived GFF3' is used for re-annotation of a genome when you want certain parts of the previous run to be passed through unchanged, but with retraining SNAP you want MAKER to re-evaluate each annotation in light of the new predictions made by the retrained SNAP.  MAKER should run really fast in all of the runs after the first one because as long as you haven't changed the evidence files it won't have to redo any of the alignments.</div><div><br></div><div><br></div>B</div><div><br><blockquote type="cite"><div>Thanks in advance for any thoughts/advice on these things! I appreciate your help!<br><br>much obliged,<br>Christoph<br><br>_______________________________________________<br>maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></div></blockquote></div><br><div><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div><span class="Apple-style-span" style="font-family: Arial; font-size: 12px; "><div>Barry Moore</div><div>Research Scientist</div><div>Dept. of Human Genetics</div><div>University of Utah</div><div>Salt Lake City, UT 84112</div><div>--------------------------------------------</div><div>(801) 585-3543</div><div><br class="khtml-block-placeholder"></div></span></div><div><br></div></span><br class="Apple-interchange-newline"></div><br></div></div></div>_______________________________________________

maker-devel mailing list

<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a>

<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>

</span></body></html>