[maker-devel] mpi issue on computing cluster
Scott Geib
smg283 at gmail.com
Fri Apr 13 13:00:29 MDT 2012
Hi,
I am trying to run maker 2.24 on a compute cluster and get the following
error (not worried about Signal.pm error):
an into unknown state (hex char: 29) at
/mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line
138.
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(388)........:
MPID_Init(139)...............: channel initialization failed
MPIDI_CH3_Init(49)...........: progress_init failed
MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1
signal, but the application has already installed a handler
[proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error
waiting for event
[proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error
waiting for event
[proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error
waiting for event
[mpiexec at r01n11.local] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
[mpiexec at r01n11.local] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for
completion
[mpiexec at r01n11.local] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for
completion
[mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager
error waiting for completion
I do not know how mpich2 was compiled, I feel this may be a
--enable-sharedlibs issue?
I may need to contact my cluster support, but I thought I would try here
first,
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120413/c4318ceb/attachment-0002.html>
More information about the maker-devel
mailing list