[maker-devel] Maker-Error when started with IMPI

Rainer Rutka rainer.rutka at uni-konstanz.de
Wed Mar 1 05:30:39 MST 2017


Hi Carson.

Again THANK YOU for your efforts :-)



Am 24.02.2017 um 18:30 schrieb Carson Holt:
> Specific things.
>
> 1. Do not set LD_PRELOAD. That is only for OpenMPI, but it will cause problems with other MPI's.
OK, I deleted this envirnoment. Not set any more.

> 2. Make sure you recompiled MAKER for Intel MPI (MPI code always has to be compiled for the flavor you are using, so make sure you have a separate installation of MAKER for Intel MPI). Also validate that the mpicc and libmpi.h listed during the MAKER install belong to Intel MPI. Don’t just assume they do because you loaded the module. Manually verify the paths during MAKER’s setup.
I validated:

UC:[kn at uc1n996 bwhpc-examples]$ module list
Currently Loaded Modulefiles:
1) compiler/intel/16.0(default)
2) mpi/impi/5.1.3-intel-16.0(default)

FOR MPICC:
UC:[kn at uc1n996 bwhpc-examples]$ type mpicc
mpicc is 
/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpicc

FOR LIBMPI:
UC:[kn at uc1n996 bwhpc-examples]$ echo $MPIDIR
/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64
UC:[kn at uc1n996 bwhpc-examples]$ find $MPIDIR -name '*'mpi.h -print
/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/include/mpi.h

Here i can find a mpi.h but not a libmpi.h. But I thinks this is o.k., 
because the SW was
compiled and linkes without any errors or missing libs.

> 3. The error you got previously should not even be possible with the current version of Intel MPI,
> which is why I say that when you called mpiexec, something else (that was not Intel MPI) was launched.
> Easy solution is to give the full path of mpiexec in your job, so are not relying on PATH to be unaltered in your job.
mpiexec is in the PATH and the right one is/was used, too.

MPIXEC:
UC:[kn at uc1n996 bwhpc-examples]$ type mpiexec
mpiexec is 
/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec
UC:[kn at bwhpc-examples]$

> Do not do —>  mpiexec -nc 1 maker
> Do this for example —> /opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec -nc maker
OK, so i did:

[...]
#MSUB -l nodes=1:ppn=1
#MSUB -l mem=20gb
[...]
echo " "
echo "### Runing Maker example"
echo " "
export OMPI_MCA_mpi_warn_on_fork=0
/opt/bwhpc/common/compiler/intel/compxe.2016.4.258/impi/5.1.3.223/intel64/bin/mpiexec 
-nc maker
[...]

> 4. Build and run on the same node for your test. If you build on one node and run on another, you may
> be changing your environment in ways you don’t realize that break things. So if you can build and test on
> the same node and it works, then it fails when you test it elsewhere, then you have to track down how your
> environment is changing.
OK I did. Same node: uc1n996


UNFORTUNATELY I GOT THE SAME ERROR:

[...]
### Runing Maker example

LD_PRELOAD=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libmpi.so
OMPI_MCA_mpi_warn_on_fork=0
I_MPI_CPUINFO=proc
I_MPI_PMI_LIBRARY=/opt/bwhpc/common/mpi/openmpi/2.0.1-intel-16.0/lib/libpmi.so
I_MPI_PIN_DOMAIN=node
I_MPI_FABRICS=shm:tcp
I_MPI_HYDRA_IFACE=ib0
mpiexec_uc1n342.localdomain: cannot connect to local mpd 
(/scratch/mpd2.console_uc1n342.localdomain_kn_pop235844); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
[...]


> —Carson

tbc. ? :-)

THANX

-- 
Rainer Rutka
Universität Konstanz
Kommunikations-, Informations-, Medienzentrum (KIM)
* KIM Ausbildung
* Wissenschaftliches Rechnen/bwHPC-C5
* KIM Basisdienste, KIM Support
Raum: V511
78457 Konstanz
+49 7531 88-5413

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170301/d38659ae/attachment-0002.p7s>


More information about the maker-devel mailing list