[maker-devel] MPI selection
Chandler
admin at genome.arizona.edu
Thu Feb 8 09:53:13 MST 2018
Thanks Carson, one other question though.
When you mentioned to keep jobs under 200 CPU cores, did you mean as in
what is emulated to the OS, or physical hardware? We are using
hyperthreading which emulates more CPUs to the OS, and so we have about
256 emulated CPUs available. Some of our applications are better
optimized for this, so I keep it that way.
So suppose we keep hyperthreading enabled, should we specify in the
machine list file that mpiexec uses to only use 128 of the emulated
cores? We have noticed with using all 256 hyperthreaded cores that the
load can get high, although everything still works great.
Thanks
Chandler / Systems Administrator
Arizona Genomics Institute
www.genome.arizona.edu
Carson Holt wrote on 01/30/2018 10:37 AM:
> MAKER does not really move a lot of data with MPI, it’s just moving around command lines and small variables. So not getting full infiniband performance will not hurt you. I doubt you see any issues using ib0. For MPI flavor, I get the best performance with Intel MPI followed by OpenMPI. Overall you will find that MAKER is IO bound as opposed to CPU or communications bound. So pointing it at your best performing network based storage will be the greatest performance factor (if you have Lustre storage, point it there for example). Pull back on job size and count if other users have issues accessing the disk (too many jobs can bring NFS to it’s knees). The one suggestion I have as far as job size, it to keep jobs sizes under 200 CPU cores. Over that, you will get better performance by splitting up datasets and submitting multiple job. Also MAKER keeps a log of it’s progress, so you can kill jobs or restart failed jobs, and they pick up right where they left off.
>
> —Carson
>
>
>
>> On Jan 30, 2018, at 10:24 AM, admin at genome.arizona.edu wrote:
>>
>> Carson Holt wrote on 01/30/2018 09:47 AM:
>>> The libraries used by MVAPICH2, Intel MPI, and OpenMPI to access infiniband have a known bug. For performance reasons, infiniband libraries use registered memory in a way that makes it impossible to do system calls to external programs under MPI (doing so results in seg faults). MAKER has to call out to external programs like BLAST, exonerate, etc., so it triggers this bug.
>>> The infiniband bug is well known, and unfortunately will not be fixed because fixing it causes infiniband to lose some advertised features like direct memory access.
>>
>>
>> Well that stinks! Maybe that's why we got such a good deal on new-old-stock infiniband equipment! Still it has allowed us to use full speed of our NFS RAIDs, which has been nice. I will try with using ib0, the speed is still about 10Gb, but I was under the impression using IPoIB would cause packet loss or other problems...
>>
>> Thanks for clearing that up. So is there a fabric/protocol you would recommend for clusters running maker?
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> .
>
More information about the maker-devel
mailing list