[maker-devel] Issues with locking in MPI mode
Michele Vidotto
michele.vidotto at gmail.com
Thu Jan 21 06:30:21 MST 2021
Dear all,
as reported in the subject I'm having issues with locking mechanism of
MAKER when it is runs in parallel-mode through mpi.
I'm using maker version 3.01.03 but the same happens in my system when I
build and install version 2.31.11.
All prerequisites were installed in a conda environment. Perl was installed
from anaconda channel in version 5.26.2. Hard-coded paths to the compilers
were fixed. Necessary perl modules were installed via cpanm:
"DBD::SQLite",
"DBI",
"Error",
"Error::Simple",
"File::NFSLock",
"File::Which",
"forks",
"forks::shared",
"Inline",
"Inline::C",
"IO::All",
"IO::Prompt",
"LWP::Simple"
"Perl::Unsafe::Signals",
"PerlIO::gzip",
"Proc::Simple",
"URI::Escape",
"DBD::Pg"
additional libraries and components were installed via conda
- gcc_linux-64=7.3.0
- gxx_linux-64=7.3.0
- openmpi=4.1.0
- zlib=1.2.11
- libdb=6.1.26
- expat=2.2.9
- libxml2=2.9.10
- exonerate=2.4.0
- snoscan=1.0
- rapsearch=2.24
other components were installed manually. MAKER compile and install with no
errors, but when I execute the program via MPI with:
# to devoid OPEN MPI segmentation fault
export THREADS_DAEMON_MODEL=1
mpiexec -mca btl ^openib -n 1 \
maker \
-force \
-cpus 8 \
--fix_nucleotides \
maker_opts.ctl \
maker_bopts.ctl \
maker_exe.ctl
It always ends up with following error:
STATUS: Parsing control files...
ERROR: The directory is locked. Perhaps by an instance of MAKER.
--> rank=NA, hostname=april.corp.igatechnology.com
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[19321,1],0]
Exit code: 10
--------------------------------------------------------------------------
if I look inside *.maker.output directory a lock file remains:
.NFSLock.gi_lock.NFSLock
If instead I run maker with the -nolock flag. MAKER runs with no problems
at all.
My filesystem is oneFS from ISILON, exported to a virtual server through
nfs4 protocol.
By looking at the code MAKER uses File::NFSLock Perl module for locking.
This module fails some tests when installed on my system with cipanm:
# Failed test at t/300_bl_sh.t line 115.
Shared locks not running simultaneously at t/300_bl_sh.t line 116, <$rd3>
line 18.
# Looks like your test exited with 4 just after 27.
t/300_bl_sh.t ..... Dubious, test returned 4 (wstat 1024, 0x400)
Failed 47/73 subtests
t/400_kill.t ...... ok
t/410_die.t ....... ok
t/420_crash.t ..... ok
t/430_taint.t ..... ok
Test Summary Report
-------------------
t/300_bl_sh.t (Wstat: 1024 Tests: 27 Failed: 1)
Failed test: 27
Non-zero exit status: 4
Parse errors: Bad plan. You planned 73 tests but ran 27.
But anyway I was able to install it with --notest flag.
Do you have any idea on how I can overcome my problem and have MAKER run in
parallel with MPI?
Thanks in advance,
---
Michele Vidotto
mailto: michele.vidotto at gmail.com <michele.vidotto at studenti.unipd.it>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20210121/acbce05b/attachment-0002.html>
More information about the maker-devel
mailing list