[maker-devel] second maker2 benchmark, this time, on a cluster

Fri Apr 5 10:00:53 MDT 2013

Thanks for the replies Carson,

Our cluster has got busy all of a sudden, so I have to wait a bit to do the
hostname test. However, I'm fairly sure (not 100%, mind you) that when the
process number is over 24 if will definitely run the extra processes on a
separate node, and so do a proper cross node launch.

On Thu, Apr 4, 2013 at 7:40 PM, Carson Holt <Carson.Holt at oicr.on.ca> wrote:

>   One more thought.  If 26,28, and 30 process jobs are failing this could
> also be because they are not starting across nodes correctly (all end up on
> the same node).  You would then start to run into memory problems and the
> job would freeze.  So validating the proper cross node launch of MPI using
> the 'hostname' command is still probably the first thing to do.
>
>  --Carson
>
>
>   *
> *
>    From: Carson Holt <carson.holt at oicr.on.ca>
> Date: Thursday, 4 April, 2013 1:29 PM
> To: Ramón Fallon <ramonfallon at gmail.com>, "maker-devel at yandell-lab.org" <
> maker-devel at yandell-lab.org>
> Subject: Re: second maker2 benchmark, this time, on a cluster
>
>    Since you are using 12 core nodes  (hyperthreaded cores are virtual –
> you still only have 12 cores of power not 24)  and your performance curve
> drops off at 12, I'm thinking there is a possibility that the other
> processes did not start on a separate node.  Try launching the Linux
> command 'hostname' the same way you are launching maker.  If all 24 lines
> of output from hostname have the same host, then maker is only getting
> launched on a single node.  Then since there are really only 12 cores (not
> 24) you would not see any significant performance improvement above 12.  So
> each process above 12 will reduce the power allocated to remaining
> processes.  So the difference from 12 to 24 (~25% performance gain) is just
> what can be gained from process saturation (not all maker processes are
> always at 100% cpu usage because of calls to IO so adding a few more
> processes than you have cpu cores sometimes runs a little faster).
>
>  Thanks,
> Carson
>
>
>
>   From: Ramón Fallon <ramonfallon at gmail.com>
> Date: Thursday, 4 April, 2013 1:03 PM
> To: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: second maker2 benchmark, this time, on a cluster
>
>  Hi
>
>  I've done another of my own benchmarks with the Maker2 svn (rev 1017)
> code. Last time I went up to 12 processes, this time I aimed for 48. In
> contrast to the last 12 core speed check, the target hardware was a
> computer cluster, with the Gridengine queue manager. The same data set of
> 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence
> in one file with different names).
>
>  The nodes in the cluster are (again) HP Proliant SL390 with two Intel
> X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running
> Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is
> that Maker2 was launched from an NFS3 shared home directory, although the
> /tmp directories are local to the process running on each node. Nodes are
> interconnected via infiniband quadspeed, and because of hyperthreading, can
> offer 24 "process-cores" to a job. No overlap between runs was allowed.
>
>  Results were:
>  #processes      time(secs)    Megabases/hr
>               1         6585.00            2.20
>               2         7137.00            2.03
>               4         2479.00            5.84
>               8         1088.00           13.30
>              10          866.00           16.71
>              12          715.00           20.24
>              14          666.00           21.72
>              16          651.00           22.22
>              18          613.00           23.60
>              24          559.00           25.88
>
>  Graph is attached to this mail. Some notes:
> * A free queue on the gridengine were used so there was no load on these
> nodes when run. Two nodes are available on this queue, giving a max of 48
> simualtaneous processes.
> * Some processor number (6,20, etc) were deleted because I couldn't
> guarantee "No load" conditions during those runs, and I had one or two
> anomalies so I'd rather not include them right now. However, I expect them
> to be in line with the other results.
> * In general the graph shows more consistent performance than last time,
> but unfortunately I got incomplete runs after processes=24. Because this is
> also the max number of processes per node, it's possible that interconnects
> between the nodes had something to do with runs > 24 processes being
> inconsistent, however, it's not usually an issue in other programs because
> quadspeed (40Gbit/s) is already a fairly fast interconnect).
> * Process runs 26,28, and 30 would almost - but not quite - finish (just a
> few sequences unfinished), But after this number, the analysis would hardly
> get off the ground, seeming to get stuck at Repeatmasker phase. I suppose
> this is our main concern at the moment, that we can't speed up beyond 24
> processes.
>
>  Cheers / Ramón.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130405/5526e6ed/attachment-0003.html>