[maker-devel] Sharing benchmarks of maker

Mon Mar 4 08:12:06 MST 2013

Performance is highly dependent on the size of evidence datasets used
(proteins/ESTs) as well as the IO performance of a system when running via
MPI (you can hit IO bottlenecks well before cpu bottlenecks depending on
cluster configuration).

The Arabidopsis genome (120Mb assembly) running SNAP and Augustus, 1.1Gb EST
dataset, and 10Mb protein dataset takes ~1 hour 30 min on 1,500 cpus with
OpenMPI.
The Maize genome (2.1 Gb) running SNAP and Augustus, 3Gb EST dataset, and 16
Mb protein dataset takes ~4 hours 30 min on 2200 cpus.
A human sized genome would take  5-6 days on 100 cpus.

MAKER is fully restartable (keeps log of progress).  So if there is any
failure or the user kills it in the middle of a job, it will pick up at the
point it left off on restart (so you don't waste all that processing time).
2Gb of RAM per processing core is recommended when parallelizing MAKER via
MPI, but fragmented genomes with smaller contigs can get by with less than
1Gb per core.  MAKER version 2.28 which has additional optimization for
OpenMPI and lower memory footprint will be available in a couple of weeks.
Until then 2.27 is recommended over 2.1 for MPI.  2.27 should also work with
OpenMPI. 2.1 only works with older versions of MPICH2 using the mpd launcher
and not the current hydra launcher.

Thanks,
Carson

From:  "Carlos A. Canchaya" <canchaya at uvigo.es>
Date:  Monday, 4 March, 2013 6:10 AM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] Sharing benchmarks of maker

Hi,

I've just install maker2 in our server and run a first test with our data.
The input was about 30 000 sequences (9.6 Mb) and it was run in just one
server with 32 processors for 36 hours) with mpich2. Our server has 250 Gb
of memory and cpus of 2,4 Gb. The test was simple because it only ran
repeatmasker and SNAP. Considering that we would like to use other gene
prediction/annotation tools available in MAKER, I wonder if you can share
some of your benchmarks in order to know if we could scale up pretty well to
our production cluster in order to annotate our 1.6 Gb draft genome

Best,

Carlos

Carlos A. Canchaya, PhD
IPP Research Fellow
Department of Biochemistry, Genetics and Immunology
Faculty of Biology
Campus Universitario
University of Vigo
36310 Vigo
Spain

http://darwin.uvigo.es/~ccanchaya/
email: canchaya at uvigo.es
Tel :  +34 986 130048
Fax:  +34 986 812556
> 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130304/24f2e224/attachment-0003.html>