[maker-devel] second maker2 benchmark, this time, on a cluster

Thu Apr 4 11:03:43 MDT 2013

Hi

I've done another of my own benchmarks with the Maker2 svn (rev 1017) code.
Last time I went up to 12 processes, this time I aimed for 48. In contrast
to the last 12 core speed check, the target hardware was a computer
cluster, with the Gridengine queue manager. The same data set of 4.019
megabases was used as before (125 times the dpp_contig.fasta sequence in
one file with different names).

The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675
@ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos
6.2 with (as before) 2.6.32 linux kernel. A marked difference is that
Maker2 was launched from an NFS3 shared home directory, although the /tmp
directories are local to the process running on each node. Nodes are
interconnected via infiniband quadspeed, and because of hyperthreading, can
offer 24 "process-cores" to a job. No overlap between runs was allowed.

Results were:
 #processes      time(secs)    Megabases/hr
              1         6585.00            2.20
              2         7137.00            2.03
              4         2479.00            5.84
              8         1088.00           13.30
             10          866.00           16.71
             12          715.00           20.24
             14          666.00           21.72
             16          651.00           22.22
             18          613.00           23.60
             24          559.00           25.88

Graph is attached to this mail. Some notes:
* A free queue on the gridengine were used so there was no load on these
nodes when run. Two nodes are available on this queue, giving a max of 48
simualtaneous processes.
* Some processor number (6,20, etc) were deleted because I couldn't
guarantee "No load" conditions during those runs, and I had one or two
anomalies so I'd rather not include them right now. However, I expect them
to be in line with the other results.
* In general the graph shows more consistent performance than last time,
but unfortunately I got incomplete runs after processes=24. Because this is
also the max number of processes per node, it's possible that interconnects
between the nodes had something to do with runs > 24 processes being
inconsistent, however, it's not usually an issue in other programs because
quadspeed (40Gbit/s) is already a fairly fast interconnect).
* Process runs 26,28, and 30 would almost - but not quite - finish (just a
few sequences unfinished), But after this number, the analysis would hardly
get off the ground, seeming to get stuck at Repeatmasker phase. I suppose
this is our main concern at the moment, that we can't speed up beyond 24
processes.

Cheers / Ramón.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130404/5baab7ac/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 48proc.png
Type: image/png
Size: 24644 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130404/5baab7ac/attachment-0002.png>