[maker-devel] mpi issue on computing cluster
Carson Holt
carsonhh at gmail.com
Tue Apr 17 11:09:32 MDT 2012
If it's a sharedlibs issue then 'maker -help' would cause the same error.
Try that.
Are you sure that you are not worried about Signal.pm causing the error?
Try changing
/mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines
136-143 from this -->
require Proc::ProcessTable;
my $obj = new Proc::ProcessTable;
foreach my $p (@{$obj->table}) {
#now check for the id
return $p if ($p->pid == $id);
}
return undef;
To this -->
my $select;
eval{
require Proc::ProcessTable;
my $obj = new Proc::ProcessTable;
foreach my $p (@{$obj->table}) {
#now check for the id
if ($p->pid == $id){
$select = $p;
last;
}
}
}
return $select;
If it works, I can generate a cleaner workaround, but I'd like to know If
that is the root of the problem.
Thanks,
Carson
From: Scott Geib <smg283 at gmail.com>
Date: Fri, 13 Apr 2012 09:00:29 -1000
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] mpi issue on computing cluster
Hi,
I am trying to run maker 2.24 on a compute cluster and get the following
error (not worried about Signal.pm error):
an into unknown state (hex char: 29) at
/mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line
138.
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(388)........:
MPID_Init(139)...............: channel initialization failed
MPIDI_CH3_Init(49)...........: progress_init failed
MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1
signal, but the application has already installed a handler
[proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error
waiting for event
[proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error
waiting for event
[proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed
[proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error
waiting for event
[mpiexec at r01n11.local] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
badly; aborting
[mpiexec at r01n11.local] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for
completion
[mpiexec at r01n11.local] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for
completion
[mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager
error waiting for completion
I do not know how mpich2 was compiled, I feel this may be a
--enable-sharedlibs issue?
I may need to contact my cluster support, but I thought I would try here
first,
Thanks
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120417/a64e7651/attachment-0003.html>
More information about the maker-devel
mailing list