<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Agree. 500,000 is about the highest you ever want to go with max_dna_len. Increasing the value decreases parallelization and increases memory usage.  The only biological reason to ever increase it is if genes are really long and don’t fit into windows of this size.<div class=""><br class=""></div><div class="">Also test out the mpiexec command with something like ‘hostname’ to make sure it works.</div><div class=""><br class=""></div><div class="">Example —></div><div class="">mpiexec -mca btl ^openib -n 128 hostname<br class=""><div class=""><br class=""></div><div class="">Should print out 128 lines identifying all hosts in the communication ring.  If it prints out the same host ID every time, then there is a problem and you may need to provide a hostfile to let mpiexec know all the hosts it can run across.</div><div class=""><br class=""></div><div class="">—Carson</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 16, 2016, at 9:42 AM, Daniel Ence <<a href="mailto:dence@genetics.utah.edu" class="">dence@genetics.utah.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">

<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">

Hi Florian, I don’t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don’t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns

 at a ll (for example, a prokaryotic genome).  

<div class=""><br class="">

</div>

<div class="">Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it’s analysis by splitting contigs/scaffolds across multiple processors. There’s usually no reason to change

 this from the default setting.</div>

<div class=""><br class="">

</div>

<div class="">With a good N50 like you have, you’ll probably get good results. </div>

<div class=""><br class="">

</div>

<div class="">~Daniel</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<div class=""><br class="">

<div class="">Daniel Ence<br class="">

Graduate Student<br class="">

Eccles Institute of Human Genetics<br class="">

University of Utah<br class="">

15 North 2030 East, Room 2100<br class="">

Salt Lake City, UT 84112-5330 </div>

<br class="">

<div class="">

<blockquote type="cite" class="">

<div class="">On Feb 16, 2016, at 3:10 AM, Florian <<a href="mailto:fdolze@students.uni-mainz.de" class="">fdolze@students.uni-mainz.de</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div text="#000000" bgcolor="#FFFFFF" class="">Hi all,<br class="">

<br class="">

I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER.<br class="">

<br class="">

My genome data is:<br class="">

<br class="">

<ul class="">

<li class="">180.652.019 bp genome length<br class="">

</li><li class="">5.292 Scaffolds </li><li class="">34.136 bp median scaffold length<br class="">

</li><li class="">2.056.324 bp longest <br class="">

</li><li class="">272.065 bp N50 </li></ul>

- I use a 73mb transcriptome assembly as EST Evidence<br class="">

- SwissProt as Protein Homology Evidence<br class="">

- 60kb custom repeat library for RepeatMasker<br class="">

<br class="">

<br class="">

<br class="">

For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice.<br class="">

I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following:

<br class="">

<br class="">

<small class="">#-----MAKER Behavior Options<br class="">

max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage)   <--- Is this reasonable?<br class="">

min_contig=1 #skip genome contigs below this length (under 10kb are often useless)<br class="">

<br class="">

pred_flank=200 #flank for extending evidence clusters sent to gene predictors<br class="">

pred_stats=0 #report AED and QI statistics for all predictions as well as models<br class="">

AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)<br class="">

min_protein=0 #require at least this many amino acids in predicted proteins<br class="">

alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no<br class="">

always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no<br class="">

map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no<br class="">

keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)<br class="">

<br class="">

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)<br class="">

single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no<br class="">

single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'<br class="">

correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes<br class="">

</small><br class="">

The maker_bopts.ctl file is unchanged.<br class="">

<br class="">

(Basically I follow this guide <a class="moz-txt-link-freetext" href="https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md">

https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md</a>)<br class="">

<br class="">

<br class="">

At the moment I am running this with openMPI as: <br class="">

<br class="">

mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides<br class="">

<br class="">

on 128 cores with 130GB of memory.<br class="">

<br class="">

<br class="">

First of all, are those options I use viable?<br class="">

<br class="">

Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much?<br class="">

<br class="">

Thanks for your insights,<br class="">

Florian<br class="">

<br class="">

<br class="">

<br class="">

<br class="">

</div>

_______________________________________________<br class="">

maker-devel mailing list<br class="">

<a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">

<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br class="">

</div>

</blockquote>

</div>

<br class="">

</div>

</div>

_______________________________________________<br class="">maker-devel mailing list<br class=""><a href="mailto:maker-devel@box290.bluehost.com" class="">maker-devel@box290.bluehost.com</a><br class="">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org<br class=""></div></blockquote></div><br class=""></div></div></body></html>