<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>Just ^C. If you change options, then it will restart at a point determined by what will be affected by the change. Since repeat masking affects everything downstream, everything will start from zero. If it was a step like changing the HMM or altering blastn_depth, then it would be less disruptive and MAKER could reuse all existing raw reports. Unfortunately it's not that way for altering repeat masking options.</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Nathaniel Jue <<a href="mailto:n.jue@uconn.edu">n.jue@uconn.edu</a>><br><span style="font-weight:bold">Date: </span> Monday, July 7, 2014 at 11:21 AM<br><span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br><span style="font-weight:bold">Cc: </span> Daniel Ence <<a href="mailto:dence@genetics.utah.edu">dence@genetics.utah.edu</a>>, "<<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>>" <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Couple quick questions about Maker<br></div><div><br></div><div dir="ltr">Thanks for the input guys. I'm guessing the error was probably from not turning off the repeat prediction or the GFF file. I have a script for converting repeatmasker output to gff so maybe I'll just try that if I want to follow-up on it. Thanks for the tips on parameter adjustments and thoughts on running the program as well. Just a few more quick follow-up questions for you: is there a preferred method for stopping a job so that it will be able to restart while maximizing the benefits of the run.logs, etc.? Or just Crtl-C it and start over? Seems like if I adjust those parameter values it may restart from the very beginning as changing the opts file sometimes does that. It that to be expected? If so, should I just bite the bullet and restart from the beginning or is it best to finish a run and then change options?<div><br></div><div>Thanks,<br>Nate<br><br><div id="WISESTAMP_SIG_gmail_session" href="http://WISESTAMP_SIG_gmail_session"><div style="font-size:13px;font-family:Verdana,Arial,Helvetica,sans-serif"><div style="margin:0 0 8px 0"><p style="margin:0"><span><strong>Nathaniel Jue, Ph.D.</strong></span></p><p style="margin:0"><span><span>Department of Molecular and Cell Biology</span></span></p><p style="margin:0"><span><span>University of Connecticut</span></span></p><p style="margin:0"><span><span>Storrs, CT 06269</span></span></p><p style="margin:0"><strong><span><br></span></strong></p><div style="clear:both"></div></div><a href="http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnathaniel-jue%2F1%2F531%2F176%2F&sn=" style="text-decoration:underline"><img width="16" height="16" alt="LinkedIn" style="padding: 0px 0px 5px 0px; vertical-align: middle;" border="0" src="https://s3.amazonaws.com/images.wisestamp.com/linkedin.png"></a><br><img src="https://wisestamp.appspot.com/pixview.gif?p=chrome&v=3.43.0&t=1404753412057&u=e87335be0c4ca423" width="1" height="1"></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Mon, Jul 7, 2014 at 12:26 PM, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif"><div>I don't think RepeatMasker produces GFF3. I believe it is GFF2 with the -gff option (which is pretty different). Also If you provide GFF# files for repeats, you will still need to turn of repeat masking in the control files by blanking out the options. Also MAKER uses a step called RepeatRunner against an internal transposable element protein databases which is probably still running (and is slow because it's a search in translated protein space).</div><div><br></div><div>For performance, you may want to give a larger max_dna_len for the MAKER run given that you have a large RAM machine. Also set all the depth_blast in maker_bopts.ctl to 15 or 20.</div><div><br></div><div>
CEGMA is convenient for training predictors because it finds genes that will always be in every eukaryote (I.e. high confidence). You can combine these with est2genome/protein2genome results from MAKER if you want. You can then use the resulting HMM for a larger MAKER run with experimental evidence, and then train again on those results. But beware than there is rarely any benefit from training beyond that second round. More training actually tends to makes things worse (the overtraining paradox).</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><div><br></div><span><div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style="font-weight:bold">From: </span> Daniel Ence <<a href="mailto:dence@genetics.utah.edu" target="_blank">dence@genetics.utah.edu</a>><br><span style="font-weight:bold">Date: </span> Monday, July 7, 2014 at 10:00 AM<br><span style="font-weight:bold">To: </span> Nathaniel Jue <<a href="mailto:n.jue@uconn.edu" target="_blank">n.jue@uconn.edu</a>><br><span style="font-weight:bold">Cc: </span> "<<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>>" <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Couple quick questions about Maker<br></div><div><div class="h5"><div><br></div><div><div style="word-wrap:break-word">
Hi Nathaniel,
<div><br></div><div>1) We'll need to see the error messages that MAKER was giving to understand what might have gone wrong with the Repeat Masker gff3 file. If you could run maker on one of your scaffolds with your current settings and send us the complete output, we can
start to figure out what happened. </div><div><br></div><div>2) MAKER interacts with its gene predictors (augustus, snap, and the other ones listed in the control files) in a way that improves their performance (with the hints and such). When you supply predictions through the pred_gff parameter, MAKER can't give
that performance improvement, so there's something of a tradeoff there. I think the performance improvement is a key part of MAKER's success, so I would definitely recommend running the ab-initio tools internally. </div><div><br></div><div>MAKER tries to save you time by saving results from run to run and only rerunning tools (usually blast tools) that had their parameters changed in the control files. Taking advantage of that will probably be the biggest time saver for you. Something else
that could save you almost as much time would be to set a reasonable lower-bound on the size of contigs that maker will try to annotate (usually <5kbp or <10kbp depending on your genome). This parameter is set with the min_contig parameter. </div><div><br></div><div>I'll have to check with my lab mates about the Repeat ORF searching and how they use CEGMA results. I think you can probably just put them all in there at once though. </div><div><br></div><div>~Daniel</div><div><br></div><div><br></div><div><br></div><div><br><div><div><span style="font-family:Tahoma;font-size:small">Daniel Ence</span></div><div><span style="font-family:Tahoma;font-size:small">Graduate Student</span></div><div><a href="mailto:dence@genetics.utah.edu" target="_blank">dence@genetics.utah.edu</a><br style="font-family:Tahoma;font-size:small"><span style="font-family:Tahoma;font-size:small">Eccles Institute of Human Genetics</span><br style="font-family:Tahoma;font-size:small"><span style="font-family:Tahoma;font-size:small">University of Utah</span><br style="font-family:Tahoma;font-size:small"><span style="font-family:Tahoma;font-size:small">15 North 2030 East, Room 2100</span><br style="font-family:Tahoma;font-size:small"><span style="font-family:Tahoma;font-size:small">Salt Lake City, UT 84112-5330</span></div></div><br><div><div>On Jul 7, 2014, at 9:26 AM, Nathaniel Jue <<a href="mailto:n.jue@uconn.edu" target="_blank">n.jue@uconn.edu</a>></div><div> wrote:</div><br><blockquote type="cite"><div dir="ltr">Hi,
<div><br></div><div>I'm trying to run maker on a couple genomes right now and was wondering if folks had any thoughts on way to speed it up a bit. I'm running it on a 48-processor supercomputer (lots of RAM, usually use it for genome assembly). Both these genomes are a little
fragmented, so there are lots of contigs, which slows down the whole process. I am looking for ways to speed things up and was wondering about a couple things:</div><div><br></div><div>1) I'm currently just at the first round of maker predictions using EST and protein evidence to build models. Had already done RepeatMasking so thought I'd just input subsequent GFF to speed it up. Didn't seem to like the GFF, so two questions: i) any
thoughts on why that GFF wasn't acceptable? It's the one that repeatmasker outputs if you ask it to; and ii) Providing this GFF, should generally allow the program to bypass the RepeatMasking step, correct? Does it also make it bypass the Repeat ORF searching
step?</div><div><br></div><div>2) I plan to run both SNAP and Augustus on these genomes as well. The two-step SNAP training from the tutorials seems straightforward, but I was wondering about the Augustus step. From what I can tell, simply providing an Augustus "trained" species name
should turn on Augustus and blast/blat-like hints generated within Maker are then used in gene prediction. Any thoughts on if it's either more accurate or faster to do the Augustus predictions outside of the Maker pipeline and then import them using the pred_gff
parameter in the maker_opts file?</div><div><br></div><div>3) Finally, I noticed that you had a script for converting cegma gff files to zff file for snap training? Currently, I am using predicted transcript for this species and protein sequences from related species to training. Does anyone have any insight into
using CEGMA results as well? Do you work iteratively with them? For instance, start with the using hints from more distant taxa (i.e. CEGMA) and then work your way closer? Just throw everything in at once and retrain after that?</div><div><br></div><div>Thanks in advance for any advice and insight.</div><div><br></div><div>Cheers,<br>
Nate<br><br><br><div href="http://WISESTAMP_SIG_gmail_session"><div style="font-size:13px;font-family:Verdana,Arial,Helvetica,sans-serif"><div style="margin:0 0 8px 0"><div style="margin:0px"><span><strong>Nathaniel Jue, Ph.D.</strong></span></div><div style="margin:0px"><span>Department of Molecular and Cell Biology</span></div><div style="margin:0px"><span>University of Connecticut</span></div><div style="margin:0px"><span>Storrs, CT 06269</span></div><div style="margin:0px"><strong><br></strong></div><div style="clear:both"></div></div><a href="http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnathaniel-jue%2F1%2F531%2F176%2F&sn=" style="text-decoration:underline" target="_blank"><img width="16" height="16" alt="LinkedIn" style="padding:0px 0px 5px 0px;vertical-align:middle" border="0" src="https://s3.amazonaws.com/images.wisestamp.com/linkedin.png"></a><br><img src="https://wisestamp.appspot.com/pixview.gif?p=chrome&v=3.42.0&t=1404745771712&u=e87335be0c4ca423" width="1" height="1"></div><img src="http://ws-stats.appspot.com/ga/pixel.png?yes__count=true%20&e=legacy_impression"></div></div></div>
_______________________________________________<br>
maker-devel mailing list<br><a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a><br><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br></blockquote></div><br></div></div></div>
_______________________________________________
maker-devel mailing list
<a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></div></div></span></div></blockquote></div><br></div></span></body></html>