<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>"I have to specify an ab initio gene model for any locus that I wish to annotate using evidence alignment (i.e. there must be a preexisting model)?"</div><div><br></div><div>Not exactly.  You need to supply an HMM for SNAP or species file for Augusutus, etc.  MAKER doesn't generate gene predictions, SNAP does. You cannot get updated models unless you've provided a way for those models to be updated.  MAKER will provide SNAP/Augustus with hints to make them perform better based on the evidence, but those hints won't even be genertated and the programs won't even run unless you provide the HMM.  Also if you provide models in gff3 format to pred_gff, there is not hint feedback (because there is no program to receive the hints - just an immutable GFF3 file).</div><div><br></div><div>If you don't have an HMM for SNAP for your organism, you can generate one using the documentation here (from GMOD 2014 tutorial) --> <a href="http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors">http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors</a></div><div><br></div><div>--Carson</div><div><br></div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Saad Arif <<a href="mailto:saad.arif@tuebingen.mpg.de">saad.arif@tuebingen.mpg.de</a>><br><span style="font-weight:bold">Date: </span> Wednesday, June 18, 2014 at 10:42 AM<br><span style="font-weight:bold">To: </span> Daniel Ence <<a href="mailto:dence@genetics.utah.edu">dence@genetics.utah.edu</a>><br><span style="font-weight:bold">Cc: </span> "<<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Help with updating an annotation<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Thanks Daniel. I think it's more clear to me now.<div><br></div><div>So If I understand correctly now: I have to specify an ab initio gene model for any locus that I wish to annotate using evidence alignment (i.e. there must be a preexisting model)? These ab initio gene models can be trained internally in Maker with SNAP using my cufflinks output as EST evidence.Alternatively, I can provide alternative ab inito predictions (for regions not present in my ensembl ref passed to model_GFF) for regions overlapping my cufflinks output via the pred_GFF option? </div><div><br></div><div>Since i'm interested in unannotated regions, i'm also passing in reference proteomes of closely related species as protein homology evidence.</div><div><br></div><div>As such i should be able to keep, only evidence supported predictions (for regions not present in my model_GFF and or better supported models for present regions) from my pred_GFF and merge them with Ensembl annotations from the model_GFF?</div><div><br></div><div>Let me know if i'm still missing something here. </div><div><br></div><div>Thanks in advance.</div><div><br></div><div>best,</div><div>Saad</div><div><div><div>On 18 Jun 2014, at 17:21, Daniel Ence wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">

Hi Saad, 

<div><br></div><div>Maker doesn't view EST or protein evidence as a gene model in themselves. There's a good reason for this. Aligners like blast  don't guarantee complete gene models, with accurate start and stop codons and splice sites. With it's default settings maker

 won't make a gene model unless there's evidence that overlaps an ab-initio prediction (or something from the pred_gff option). </div><div><br></div><div>You can use est2genome to promote everything from the est_gff option to a gene model, but this will probably give you many spurious results. What you're saying with est2genome is, "Everything that this tool found is a complete gene model." I don't think

 that's true even for cufflinks output. </div><div><br></div><div>One of the gene predictors that can run internally is snap. It's really easy to train; here's a link to a tutorial for training it: <a href="http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors">http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Training_ab_initio_Gene_Predictors</a></div><div><br></div><div>Let me know if that helps, or if you have more question</div><div><br></div><div><br></div><div>~Daniel</div><div><br></div><div><div apple-content-edited="true"><div><span style="font-family: Tahoma; font-size: small; ">Daniel Ence</span></div><div><span style="font-family: Tahoma; font-size: small; ">Graduate Student</span></div><div><a href="mailto:dence@genetics.utah.edu">dence@genetics.utah.edu</a><br style="font-family: Tahoma; font-size: small; "><span style="font-family: Tahoma; font-size: small; ">Eccles Institute of Human Genetics</span><br style="font-family: Tahoma; font-size: small; "><span style="font-family: Tahoma; font-size: small; ">University of Utah</span><br style="font-family: Tahoma; font-size: small; "><span style="font-family: Tahoma; font-size: small; ">15 North 2030 East, Room 2100</span><br style="font-family: Tahoma; font-size: small; "><span style="font-family: Tahoma; font-size: small; ">Salt Lake City, UT 84112-5330</span></div></div><br><div><div>On Jun 18, 2014, at 5:09 AM, Saad Arif <<a href="mailto:saad.arif@tuebingen.mpg.de">saad.arif@tuebingen.mpg.de</a>></div><div> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Thank you for the response. I still have one question though, with these options:<br><br>

est_GFF=cufflinksout.GFF<br><br>

modle_GFF= ensembl reference.GFF<br><br>

What happens to cufflinks assembled transcripts that are not confined to current gene loci (i.e. novel genes in cufflinks ouput)? Would i have to prepare ab initio gene predictions for each of these predicted 'new' genes?

<br>

Is there a simple way to combine adding (new genes) and improving of an existing annotation?<br><br>

Any feedback on this would be greatly appreciated.<br><br>

saad<br><br>

On 13 Jun 2014, at 17:59, Carson Holt wrote:<br><br><blockquote type="cite">Use the cufflinks instead of the tophat features (tophat tends to be<br>

really noisy).  Give the existing models to model_gff (they will then<br>

always be kept unless something better is found).  There is no option to<br>

keep models and then just add isoforms.  The model_gff input will either<br>

be kept as is (unchanged), or replaced with an updated model suggested by<br>

the evidence (the updated model may contain multiple isoforms though), and<br>

map_forward=1 can be used to pull names forward from the old model onto<br>

the new models.<br><br>

Thansk,<br>

Carson<br><br><br>

On 6/13/14, 5:03 AM, "Saad Arif" <<a href="mailto:saad.arif@tuebingen.mpg.de">saad.arif@tuebingen.mpg.de</a>> wrote:<br><br><blockquote type="cite">Dear All,<br><br>

I would like to use Maker pipeline  to expand a current annotation (new<br>

isoforms and novel genes with respect to current annotation) and was<br>

wondering if anyone had experience with this and or suggestions to my<br>

questions.<br><br>

Briefly:<br><br>

I have tophat splice junctions from RNAseq data or alternatively<br>

cufflinks generated transcript models (fasts format) that i want to use<br>

as my new data (est_gff or est).<br><br>

I want to provide the current Ensembl annotation for gene prediction but<br>

i want this annotation to remain unchanged. Hence, i’m not sure if i<br>

should provide this annotation as pred_gff<br>

or model_gff. Can the model_gff be used for gene prediction or is this<br>

just a subset of pred_gff that remain unaltered? Can we provide the same<br>

annotation for both options (pred_ and mod_gff)?<br><br><br><br>

Importantly, my main goal is to use the new RNAseq data to add more<br>

isoforms and (any) novel genes to the existing Ensembl annotation. Any<br>

thoughts or suggestions on how to go about  this would be  sincerely<br>

appreciated.<br><br><br>

Thanks in advance,<br>

saad<br><br><br><br><br>

_______________________________________________<br>

maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></blockquote><br><br></blockquote><br><br>

_______________________________________________<br>

maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></blockquote></div><br></div></div></blockquote></div><br></div></div></div>_______________________________________________

maker-devel mailing list

<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a>

<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>

</span></body></html>