[maker-devel] Help with MPI installation and configuration
Tim Fallon
tfallon at mit.edu
Mon Apr 17 09:35:31 MDT 2017
Hi there,
I’ve been troubleshooting the maker 3.0.0 beta plus MPI installation and configuration with our institute’s IT department. Despite a lot of tries and recompilations from scratch, we haven’t been able to get it to work (non-MPI maker works fine, albeit a bit slow). I’ve attached our most recent crash log when trying to run maker with MPI. Any tips? I’ve never used MPI before maker so not that familiar with troubleshooting it.
See below for my most recent contact with IT on it. I’ve also attached the most recent log file. We are trying to execute it through the LSF cluster submission tool, but I don’t think that is related to the problem. Any ideas?
"
I think one option is to try a build on a host without the full suite of cluster software. It may actually help, but if nothing else we can say that our problem exists with a (relatively) vanilla ubuntu linux host.
Peter Macfarlane / WIBR IT
Description : Hi Peter,
Hoping to hear back from you on this. It does seem like we’ve exhausted the reasonable troubleshooting for this maker MPI execution, but maybe worth another shot before I ask the developers?
All the best,
-Tim
On Apr 7, 2017, at 9:11 AM, Tim Fallon <tfallon at mit.edu <mailto:tfallon at mit.edu>> wrote:
Hi Peter,
I’ve tested with the following command:
bsub -o bsub_maker_with_mpi_log.txt "/nfs/apps/test/openmpi/bin/mpirun -n 8 /nfs/apps/test/maker/bin/maker”
Unfortunately it still seems to be giving errors and not running, with things like:
"“mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)””
It also doesn’t work if I try to run it directly on Tak4.
I’ve attached the relevant error log file. Any more solutions?
All the best,
-Tim
On Apr 5, 2017, at 3:50 PM, IT Systems Group <unix-help at wi.mit.edu <mailto:unix-help at wi.mit.edu>> wrote:
I tried rebuilding maker 3 with it's own also-built-from-source openmpi instance. So with that:
- /nfs/apps/test/maker is now maker 3.00
- /nfs/apps/test/openmpi was used to make the aforementioned maker
I think that all you'll need to do differently here is invoke the version of mpirun in /nfs/apps/test/openmpi/bin
"
All the best,
-Tim
Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT
tfallon at mit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170417/e6eb4917/attachment-0004.html>
-------------- next part --------------
[it-c05b10:00466] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00467] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00465] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00482] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00468] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00473] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00476] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00481] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /nfs/apps/test/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
[it-c05b10:00466] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00467] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00468] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00466] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00476] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00467] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00468] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00473] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00482] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00467] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00476] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00466] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00473] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00468] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00465] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00481] mca_base_component_repository_open: unable to open mca_shmem_mmap: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00482] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00476] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00473] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00482] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00481] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00465] mca_base_component_repository_open: unable to open mca_shmem_posix: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
[it-c05b10:00481] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
[it-c05b10:00465] mca_base_component_repository_open: unable to open mca_shmem_sysv: /nfs/apps/test/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[it-c05b10:466] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[it-c05b10:467] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[it-c05b10:473] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[it-c05b10:476] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[it-c05b10:482] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[30092,1],1]
Exit code: 1
--------------------------------------------------------------------------
------------------------------------------------------------
Sender: LSF System <lsfadmin at it-c05b10>
Subject: Job 1703425: </nfs/apps/test/openmpi/bin/mpirun -n 8 /nfs/apps/test/maker/bin/maker> Exited
Job </nfs/apps/test/openmpi/bin/mpirun -n 8 /nfs/apps/test/maker/bin/maker> was submitted from host <tak4> by user <tfallon> in cluster <irken>.
Job was executed on host(s) <it-c05b10>, in queue <normal>, as user <tfallon> in cluster <irken>.
</home/tfallon> was used as the home directory.
</lab/solexa_weng/Seq_data/Projects/Tim_Fallon/ppyralis_genome/Genome_project_reference_assemblies/version1/analyses/maker_mpi_troubleshooting_withmpi> was used as the working directory.
Started at Fri Apr 7 09:04:14 2017
Results reported at Fri Apr 7 09:04:18 2017
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
/nfs/apps/test/openmpi/bin/mpirun -n 8 /nfs/apps/test/maker/bin/maker
------------------------------------------------------------
Exited with exit code 1.
Resource usage summary:
CPU time : 5.99 sec.
Max Memory : 3 MB
Max Swap : 37 MB
Max Processes : 1
Max Threads : 1
The output (if any) is above this job summary.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170417/e6eb4917/attachment-0005.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1849 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170417/e6eb4917/attachment-0002.p7s>
More information about the maker-devel
mailing list