[maker-devel] Estimated runtime on 180mb genome @ 128 cores?

Daniel Ence dence at genetics.utah.edu
Tue Feb 16 09:42:51 MST 2016


Hi Florian, I don’t think you want est2genome or protein2genome turned on for this run. Est2genome is usually only used if you don’t have any ab-initio predictors trained; protein2genome should only be used if you have good reason not to expect any introns at a ll (for example, a prokaryotic genome).

Also, you set the max_dna_len parameter for 2.1Mbp, which is larger than your N50. Setting this too large prevents MAKER from speeding up it’s analysis by splitting contigs/scaffolds across multiple processors. There’s usually no reason to change this from the default setting.

With a good N50 like you have, you’ll probably get good results.

~Daniel




Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

On Feb 16, 2016, at 3:10 AM, Florian <fdolze at students.uni-mainz.de<mailto:fdolze at students.uni-mainz.de>> wrote:

Hi all,

I am trying to run MAKER on a project of mine and since this is the first time I use MAKER I'd like to ask some more experienced users what I can expect in regard to resource consumption and runtime of MAKER.

My genome data is:


  *   180.652.019 bp genome length
  *   5.292 Scaffolds
  *   34.136 bp median scaffold length
  *   2.056.324 bp longest
  *   272.065 bp N50

- I use a 73mb transcriptome assembly as EST Evidence
- SwissProt as Protein Homology Evidence
- 60kb custom repeat library for RepeatMasker



For gene prediction I am running with a SNAP hmm I generated using CEGMA, GeneMark, and Augustus trained by their webservice.
I have options est2genome and protein2genome turned on (=1) and use tRNAscan and snoscan. And other options as following:

#-----MAKER Behavior Options
max_dna_len=2100000 #length for dividing up contigs into chunks (increases/decreases memory usage)   <--- Is this reasonable?
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

The maker_bopts.ctl file is unchanged.

(Basically I follow this guide https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md)


At the moment I am running this with openMPI as:

mpiexec -mca btl ^openib -n 128 /project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 -fix_nucleotides

on 128 cores with 130GB of memory.


First of all, are those options I use viable?

Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? And is it reasonable to use additional cores or will this not benefit much?

Thanks for your insights,
Florian




_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160216/7fbe6297/attachment-0003.html>


More information about the maker-devel mailing list