[maker-devel] Estimated runtime on 180mb genome @ 128 cores?
Florian
fdolze at students.uni-mainz.de
Tue Feb 16 03:10:03 MST 2016
Hi all,
I am trying to run MAKER on a project of mine and since this is the
first time I use MAKER I'd like to ask some more experienced users what
I can expect in regard to resource consumption and runtime of MAKER.
My genome data is:
* 180.652.019 bp genome length
* 5.292 Scaffolds
* 34.136 bp median scaffold length
* 2.056.324 bp longest
* 272.065 bp N50
- I use a 73mb transcriptome assembly as EST Evidence
- SwissProt as Protein Homology Evidence
- 60kb custom repeat library for RepeatMasker
For gene prediction I am running with a SNAP hmm I generated using
CEGMA, GeneMark, and Augustus trained by their webservice.
I have options est2genome and protein2genome turned on (=1) and use
tRNAscan and snoscan. And other options as following:
#-----MAKER Behavior Options
max_dna_len=2100000 #length for dividing up contigs into chunks
(increases/decreases memory usage) <--- Is this reasonable?
min_contig=1 #skip genome contigs below this length (under 10kb are
often useless)
pred_flank=200 #flank for extending evidence clusters sent to gene
predictors
pred_stats=0 #report AED and QI statistics for all predictions as well
as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes,
0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
yes, 0 = no
keep_preds=1 #Concordance threshold to add unsupported gene prediction
(bound by 0 and 1)
split_hit=10000 #length for the splitting of hits (expected max intron
size for evidence alignments)
single_exon=1 #consider single exon EST evidence when generating
annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if
'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
The maker_bopts.ctl file is unchanged.
(Basically I follow this guide
https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md)
At the moment I am running this with openMPI as:
mpiexec -mca btl ^openib -n 128
/project/molgen/Bio/maker-2.31.8_MPI-1.8.1/bin/maker -base maker_run1
-fix_nucleotides
on 128 cores with 130GB of memory.
First of all, are those options I use viable?
Is it possible to guesstimate the runtime I can expect? 5 days? 20 days?
And is it reasonable to use additional cores or will this not benefit much?
Thanks for your insights,
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160216/4667cc46/attachment-0003.html>
More information about the maker-devel
mailing list