[maker-devel] Issues with locking in MPI mode

Michele Vidotto michele.vidotto at gmail.com
Thu Jan 21 06:30:21 MST 2021


Dear all,

as reported in the subject I'm having issues with locking mechanism of
MAKER when it is runs in parallel-mode through mpi.
I'm using maker version 3.01.03 but the same happens in my system when I
build and install version 2.31.11.
All prerequisites were installed in a conda environment. Perl was installed
from anaconda channel in version 5.26.2. Hard-coded paths to the compilers
were fixed. Necessary perl modules were installed via cpanm:

"DBD::SQLite",
"DBI",
"Error",
"Error::Simple",
"File::NFSLock",
"File::Which",
"forks",
"forks::shared",
"Inline",
"Inline::C",
"IO::All",
"IO::Prompt",
"LWP::Simple"
"Perl::Unsafe::Signals",
"PerlIO::gzip",
"Proc::Simple",
"URI::Escape",
"DBD::Pg"

additional libraries and components were installed via conda

  - gcc_linux-64=7.3.0
  - gxx_linux-64=7.3.0
  - openmpi=4.1.0
  - zlib=1.2.11
  - libdb=6.1.26
  - expat=2.2.9
  - libxml2=2.9.10
  - exonerate=2.4.0
  - snoscan=1.0
  - rapsearch=2.24

other components were installed manually. MAKER compile and install with no
errors, but when I execute the program via MPI with:

# to devoid OPEN MPI segmentation fault
export THREADS_DAEMON_MODEL=1

mpiexec -mca btl ^openib -n 1 \
maker \
-force \
-cpus 8 \
--fix_nucleotides \
maker_opts.ctl \
maker_bopts.ctl \
maker_exe.ctl

It always ends up with following error:


STATUS: Parsing control files...
ERROR: The directory is locked.  Perhaps by an instance of MAKER.

--> rank=NA, hostname=april.corp.igatechnology.com
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[19321,1],0]
  Exit code:    10
--------------------------------------------------------------------------

if I look inside *.maker.output directory a lock file remains:

.NFSLock.gi_lock.NFSLock

If instead I run maker with the -nolock flag. MAKER runs with no problems
at all.

My filesystem is oneFS from ISILON, exported to a virtual server through
nfs4 protocol.
By looking at the code MAKER uses File::NFSLock Perl module for locking.
This module fails some tests when installed on my system with cipanm:

#   Failed test at t/300_bl_sh.t line 115.
Shared locks not running simultaneously at t/300_bl_sh.t line 116, <$rd3>
line 18.
# Looks like your test exited with 4 just after 27.
t/300_bl_sh.t ..... Dubious, test returned 4 (wstat 1024, 0x400)
Failed 47/73 subtests
t/400_kill.t ...... ok
t/410_die.t ....... ok
t/420_crash.t ..... ok
t/430_taint.t ..... ok

Test Summary Report
-------------------
t/300_bl_sh.t   (Wstat: 1024 Tests: 27 Failed: 1)
  Failed test:  27
  Non-zero exit status: 4
  Parse errors: Bad plan.  You planned 73 tests but ran 27.



But anyway I was able to install it with --notest flag.
Do you have any idea on how I can overcome my problem and have MAKER run in
parallel with MPI?

Thanks in advance,




---
Michele Vidotto
mailto: michele.vidotto at gmail.com <michele.vidotto at studenti.unipd.it>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20210121/acbce05b/attachment-0002.html>


More information about the maker-devel mailing list