[maker-devel] Estimated runtime on 180mb genome @ 128 cores?

Tue Feb 16 03:10:03 MST 2016

Hi all,

I am trying to run MAKER on a project of mine and since this is the 
first time I use MAKER I'd like to ask some more experienced users what 
I can expect in regard to resource consumption and runtime of MAKER.

My genome data is:

  * 180.652.019 bp genome length
  * 5.292 Scaffolds
  * 34.136 bp median scaffold length
  * 2.056.324 bp longest
  * 272.065 bp N50

- I use a 73mb transcriptome assembly as EST Evidence
- SwissProt as Protein Homology Evidence
- 60kb custom repeat library for RepeatMasker

For gene prediction I am running with a SNAP hmm I generated using 
CEGMA, GeneMark, and Augustus trained by their webservice.
I have options est2genome and protein2genome turned on (=1) and use 
tRNAscan and snoscan. And other options as following:

#-----MAKER Behavior Options
max_dna_len=2100000 #length for dividing up contigs into chunks 
(increases/decreases memory usage)   <--- Is this reasonable?
min_contig=1 #skip genome contigs below this length (under 10kb are 
often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene 
predictors
pred_stats=0 #report AED and QI statistics for all predictions as well 
as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = 
yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 
0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = 
yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction 
(bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron 
size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating 
annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 
'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

The maker_bopts.ctl file is unchanged.

(Basically I follow this guide 
https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md)

At the moment I am running this with openMPI as:

mpiexec -mca btl ^openib -n 128 
/project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1 
-fix_nucleotides

on 128 cores with 130GB of memory.

First of all, are those options I use viable?

Is it possible to guesstimate the runtime I can expect? 5 days? 20 days? 
And is it reasonable to use additional cores or will this not benefit much?

Thanks for your insights,
Florian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160216/4667cc46/attachment-0003.html>