[maker-devel] second maker2 benchmark, this time, on a cluster
Ramón Fallon
ramonfallon at gmail.com
Thu Apr 4 11:03:43 MDT 2013
Hi
I've done another of my own benchmarks with the Maker2 svn (rev 1017) code.
Last time I went up to 12 processes, this time I aimed for 48. In contrast
to the last 12 core speed check, the target hardware was a computer
cluster, with the Gridengine queue manager. The same data set of 4.019
megabases was used as before (125 times the dpp_contig.fasta sequence in
one file with different names).
The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675
@ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos
6.2 with (as before) 2.6.32 linux kernel. A marked difference is that
Maker2 was launched from an NFS3 shared home directory, although the /tmp
directories are local to the process running on each node. Nodes are
interconnected via infiniband quadspeed, and because of hyperthreading, can
offer 24 "process-cores" to a job. No overlap between runs was allowed.
Results were:
#processes time(secs) Megabases/hr
1 6585.00 2.20
2 7137.00 2.03
4 2479.00 5.84
8 1088.00 13.30
10 866.00 16.71
12 715.00 20.24
14 666.00 21.72
16 651.00 22.22
18 613.00 23.60
24 559.00 25.88
Graph is attached to this mail. Some notes:
* A free queue on the gridengine were used so there was no load on these
nodes when run. Two nodes are available on this queue, giving a max of 48
simualtaneous processes.
* Some processor number (6,20, etc) were deleted because I couldn't
guarantee "No load" conditions during those runs, and I had one or two
anomalies so I'd rather not include them right now. However, I expect them
to be in line with the other results.
* In general the graph shows more consistent performance than last time,
but unfortunately I got incomplete runs after processes=24. Because this is
also the max number of processes per node, it's possible that interconnects
between the nodes had something to do with runs > 24 processes being
inconsistent, however, it's not usually an issue in other programs because
quadspeed (40Gbit/s) is already a fairly fast interconnect).
* Process runs 26,28, and 30 would almost - but not quite - finish (just a
few sequences unfinished), But after this number, the analysis would hardly
get off the ground, seeming to get stuck at Repeatmasker phase. I suppose
this is our main concern at the moment, that we can't speed up beyond 24
processes.
Cheers / Ramón.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130404/5baab7ac/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 48proc.png
Type: image/png
Size: 24644 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130404/5baab7ac/attachment-0002.png>
More information about the maker-devel
mailing list