[maker-devel] Estimated runtime on 180mb genome @ 128 cores?

Tue Feb 16 09:53:55 MST 2016

Agree. 500,000 is about the highest you ever want to go with max_dna_len. Increasing the value decreases parallelization and increases memory usage.  The only biological reason to ever increase it is if genes are really long and don’t fit into windows of this size.

Also test out the mpiexec command with something like ‘hostname’ to make sure it works.

Example —>
mpiexec -mca btl ^openib -n 128 hostname

Should print out 128 lines identifying all hosts in the communication ring.  If it prints out the same host ID every time, then there is a problem and you may need to provide a hostfile to let mpiexec know all the hosts it can run across.

—Carson

> On Feb 16, 2016, at 9:42 AM, Daniel Ence <dence at genetics.utah.edu> wrote:
> 
> Hi Florian, I don’t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don’t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome).  
> 
> Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it’s analysis by splitting contigs/scaffolds across multiple processors. There’s usually no reason to change this from the default setting.
> 
> With a good N50 like you have, you’ll probably get good results. 
> 
> ~Daniel
> 
> 
> 
> 
> Daniel Ence
> Graduate Student
> Eccles Institute of Human Genetics
> University of Utah
> 15 North 2030 East, Room 2100
> Salt Lake City, UT 84112-5330
> 
>> On Feb 16, 2016, at 3:10 AM, Florian <fdolze at students.uni-mainz.de <mailto:fdolze at students.uni-mainz.de>> wrote:
>> 
>> Hi all,
>> 
>> I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER.
>> 
>> My genome data is:
>> 
>> 180.652.019 bp genome length
>> 5.292 Scaffolds
>> 34.136 bp median scaffold length
>> 2.056.324 bp longest 
>> 272.065 bp N50
>> - I use a 73mb transcriptome assembly as EST Evidence
>> - SwissProt as Protein Homology Evidence
>> - 60kb custom repeat library for RepeatMasker
>> 
>> 
>> 
>> For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice.
>> I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following: 
>> 
>> #-----MAKER Behavior Options
>> max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage)   <--- Is this reasonable?
>> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
>> 
>> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
>> pred_stats=0 #report AED and QI statistics for all predictions as well as models
>> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
>> min_protein=0 #require at least this many amino acids in predicted proteins
>> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
>> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
>> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
>> keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
>> 
>> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
>> single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
>> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
>> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
>> 
>> The maker_bopts.ctl file is unchanged.
>> 
>> (Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md <https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md>)
>> 
>> 
>> At the moment I am running this with openMPI as: 
>> 
>> mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides
>> 
>> on 128 cores with 130GB of memory.
>> 
>> 
>> First of all, are those options I use viable?
>> 
>> Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much?
>> 
>> Thanks for your insights,
>> Florian
>> 
>> 
>> 
>> 
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160216/bb868058/attachment-0003.html>