[maker-devel] MPI selection
Carson Holt
carsonhh at gmail.com
Thu Feb 8 10:10:04 MST 2018
Yes, you can oversubscribe and match the hyperthread count (you will need ~1GB of RAM per process though). But you should still keep the overall number of MPI processes given to mpiexec below ~200. This is because, the way I structured the MPI controller in MAKER is that I have one manager process and all the other processes are workers. So if there are too many workers, they can overwhelm the manager with communications, and scaling efficiency drops. So starting another MAKER job, launches a new manager with his own workers. In that way, you get around the communication bottleneck by launching multiple large MAKER jobs and scaling efficiency returns to near linear. You can run with multiple jobs until you start hitting IO bottlenecks (MAKER will perform high IOPS but not necessarily high bandwidth).
—Carson
> On Feb 8, 2018, at 5:53 PM, Chandler <admin at genome.arizona.edu> wrote:
>
> Thanks Carson, one other question though.
>
> When you mentioned to keep jobs under 200 CPU cores, did you mean as in what is emulated to the OS, or physical hardware? We are using hyperthreading which emulates more CPUs to the OS, and so we have about 256 emulated CPUs available. Some of our applications are better optimized for this, so I keep it that way.
>
> So suppose we keep hyperthreading enabled, should we specify in the machine list file that mpiexec uses to only use 128 of the emulated cores? We have noticed with using all 256 hyperthreaded cores that the load can get high, although everything still works great.
>
> Thanks
>
>
> Chandler / Systems Administrator
> Arizona Genomics Institute
> www.genome.arizona.edu
>
>
> Carson Holt wrote on 01/30/2018 10:37 AM:
>> MAKER does not really move a lot of data with MPI, it’s just moving around command lines and small variables. So not getting full infiniband performance will not hurt you. I doubt you see any issues using ib0. For MPI flavor, I get the best performance with Intel MPI followed by OpenMPI. Overall you will find that MAKER is IO bound as opposed to CPU or communications bound. So pointing it at your best performing network based storage will be the greatest performance factor (if you have Lustre storage, point it there for example). Pull back on job size and count if other users have issues accessing the disk (too many jobs can bring NFS to it’s knees). The one suggestion I have as far as job size, it to keep jobs sizes under 200 CPU cores. Over that, you will get better performance by splitting up datasets and submitting multiple job. Also MAKER keeps a log of it’s progress, so you can kill jobs or restart failed jobs, and they pick up right where they left off.
>> —Carson
>>> On Jan 30, 2018, at 10:24 AM, admin at genome.arizona.edu wrote:
>>>
>>> Carson Holt wrote on 01/30/2018 09:47 AM:
>>>> The libraries used by MVAPICH2, Intel MPI, and OpenMPI to access infiniband have a known bug. For performance reasons, infiniband libraries use registered memory in a way that makes it impossible to do system calls to external programs under MPI (doing so results in seg faults). MAKER has to call out to external programs like BLAST, exonerate, etc., so it triggers this bug.
>>>> The infiniband bug is well known, and unfortunately will not be fixed because fixing it causes infiniband to lose some advertised features like direct memory access.
>>>
>>>
>>> Well that stinks! Maybe that's why we got such a good deal on new-old-stock infiniband equipment! Still it has allowed us to use full speed of our NFS RAIDs, which has been nice. I will try with using ib0, the speed is still about 10Gb, but I was under the impression using IPoIB would cause packet loss or other problems...
>>>
>>> Thanks for clearing that up. So is there a fabric/protocol you would recommend for clusters running maker?
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> .
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list