<div dir="ltr"><div>Hi</div><div><br></div><div>I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names).</div>
<div><br></div><div>The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed.</div>
<div><br></div><div>Results were:</div><div> #processes time(secs) Megabases/hr</div><div> 1 6585.00 2.20</div><div> 2 7137.00 2.03</div><div> 4 2479.00 5.84</div>
<div> 8 1088.00 13.30</div><div> 10 866.00 16.71</div><div> 12 715.00 20.24</div><div> 14 666.00 21.72</div>
<div> 16 651.00 22.22</div><div> 18 613.00 23.60</div><div> 24 559.00 25.88</div><div><br></div><div>Graph is attached to this mail. Some notes:</div>
<div>* A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes.</div><div>* Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results.</div>
<div>* In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect).</div>
<div>* Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. </div>
<div><br></div><div>Cheers / Ramón.<br></div></div>