[maker-devel] Maker-Error when started with IMPI : CORRECTED MAIL : SEE THIS ONE

Rainer Rutka rainer.rutka at uni-konstanz.de
Fri Mar 24 03:10:45 MDT 2017


HI!
First of all thank your for previous help.
Running Maker 2.31.9 with MPI (Intel) is running fine, if we
use ONE node only.

But, if we try to concatenate more than one node (e.g. 2 node a´ 8
cores) we get this error:

[...]
### Running Maker example

MOAB_PROCCOUNT: 16
slurmstepd: error: couldn't chdir to 
`/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356': No such file 
or directory: going to /tmp instead
STATUS: Parsing control files...
Argument "ALRM" isn't numeric in exit at 
/pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm 
line 2184.
[...]

/tmp/kn_pop235844/maker-job.uc1.11658244.170324_043356
was created before and is EXISTING during the period of the
job continuance.

I attached the complete log to this e-mail.

Again: THANK YOU VERY MUCH.

All the best.

-- 
Rainer Rutka
Universität Konstanz
Kommunikations-, Informations-, Medienzentrum (KIM)
  * KIM Ausbildung
  * Wissenschaftliches Rechnen/bwHPC-C5
  * KIM Basisdienste, KIM Support
Raum: V511
78457 Konstanz
+49 7531 88-5413
-------------- next part --------------
#!/bin/bash
#MSUB -N maker-job
#MSUB -j oe
#MSUB -o $(JOBNAME).$(JOBID)
#MSUB -m ae
#     -M given_name.family_name at your-uni.de
#MSUB -l nodes=2:ppn=8
#MSUB -l mem=20gb
#MSUB -l walltime=01:00:00
#
start=$(date +%s)

echo " "
echo "### Setting up shell environment ..."
echo " "
# if test -e "/etc/profile"; then source "/etc/profile"; fi;
if test -e "$HOME/.bash_profile"; then source "$HOME/.bash_profile"; fi;
unset LANG; export LC_ALL="C"; export MKL_NUM_THREADS=1; export OMP_NUM_THREADS=1
export USER=${USER:=`logname`}
export MOAB_JOBID=${MOAB_JOBID:=`date +%s`}
export MOAB_SUBMITDIR=${MOAB_SUBMITDIR:=`pwd`}
export MOAB_JOBNAME=${MOAB_JOBNAME:=`basename "$0"`}
export MOAB_JOBNAME=$(echo "${MOAB_JOBNAME}" | sed 's/[^a-zA-Z0-9._-]/_/g')
export MOAB_NODECOUNT=${MOAB_NODECOUNT:=1}
export MOAB_PROCCOUNT=${MOAB_PROCCOUNT:=1}
ulimit -s 200000

echo " "
echo "### Printing basic job infos to stdout ..."
echo " "
echo "START_TIME           = `date +'%y-%m-%d %H:%M:%S %s'`"
echo "HOSTNAME             = ${HOSTNAME}"
echo "USER                 = ${USER}"
echo "MOAB_JOBNAME         = ${MOAB_JOBNAME}"
echo "MOAB_JOBID           = ${MOAB_JOBID}"
echo "MOAB_SUBMITDIR       = ${MOAB_SUBMITDIR}"
echo "MOAB_NODECOUNT       = ${MOAB_NODECOUNT}"
echo "MOAB_PROCCOUNT       = ${MOAB_PROCCOUNT}"
echo "SLURM_NODELIST       = ${SLURM_NODELIST}"
echo "PBS_NODEFILE         = ${PBS_NODEFILE}"

if test -f "${PBS_NODEFILE}"; then
  echo "PBS_NODEFILE (begin) ---------------------------------"
  NO_NODES=$(wc -l < ${PBS_NODEFILE})
  cat "${PBS_NODEFILE}"
  echo "PBS_NODEFILE (end) -----------------------------------"
else
  NO_NODES=1
fi
# ##############################################################################
echo " "
echo "### Creating TMP_WORK_DIR directory and changing to it ..."
echo " "
# Using "/tmp/$USER" should be ok for one node jobs. In case of multi-node jobs
# it might be neccessary to modify TMP_BASE_DIR to point to SLURM_SUBMIT_DIR
# or to create (and delete) TMP_WORK_DIR on each node (job-type dependent).
# NEVER EVER calculate in your home directory.
JOB_WORK_DIR="${SLURM_JOB_NAME}.uc1.${SLURM_JOB_ID%%.*}.$(date +%y%m%d_%H%M%S)"
if test -z "$SLURM_NNODES" -o "$SLURM_NNODES" = "1"
then
  TMP_BASE_DIR="/tmp/${USER}"
else
# in case of 2 or more nodes, use a common scratch dir available on all nodes...
  TMP_BASE_DIR="$SLURM_SUBMIT_DIR"
fi
TMP_WORK_DIR="${TMP_BASE_DIR}/${JOB_WORK_DIR}"
echo "JOB_WORK_DIR      = ${JOB_WORK_DIR}"
echo "TMP_BASE_DIR      = ${TMP_BASE_DIR}"
echo "TMP_WORK_DIR cd   = ${TMP_WORK_DIR}"
mkdir -vp "${TMP_WORK_DIR}" && { cd "${TMP_WORK_DIR}"; pwd; } || { echo "ERROR: cd $TMP_WORK_DIR"; exit 1; }
# Remarks:
# * The job's temporary subdirectory JOB_WORK_DIR consists of SLURM_JOB_NAME
#   and SLURM_JOB_ID connected by ".uc1.". This is a little bit of magic since
#   the output file of your job follows the same rule. Therefore the
#   sorting of files belonging to one job will work nicely, when you
#   list the result files later in the submit directory (SLURM_SUBMIT_DIR).
# * Using TMP_BASE_DIR="/tmp/$USER" is ok, if the job requires less
#   than 3.6 TB of node local disk space (for details see "www.bwhpc-c5.de").
# ##############################################################################

echo " "
echo "### Loading MAKER module:"
echo " "
module load bio/maker/2.31.9
[ "$MAKER_VERSION" ] || { echo "ERROR: Failed to load module 'bio/maker/2.31.9'."; exit 1; }
echo "MAKER_VERSION       = $MAKER_VERSION"
module list

echo " "
echo "### Copying input examples files for job:"
echo " "
cp -v ${MAKER_EXA_DIR}/*.{fasta,ctl} .
sleep 2

echo " "
echo "### Display internal Maker/bwHPC environments..."
echo " "
echo "MAKER_BIN_DIR  = ${MAKER_BIN_DIR}"
echo "MAKER_EXA_DIR  = ${MAKER_EXA_DIR}"
echo ""

echo " "
echo "### Runing Maker example"
echo " "
export OMPI_MCA_mpi_warn_on_fork=0

# 
# Do NOT use mpiexec here. Unfortunately this crashes
# "STATUS: Processing and indexing input FASTA files..."
# exec.hydra -n 2 maker -h
echo "MOAB_PROCCOUNT: ${MOAB_PROCCOUNT:=1}"
# do NOT use mpiexec. use mpiexec.hydra or mpirun.
# mpirun -n ${MOAB_PROCCOUNT} maker -h
# mpirun -n ${MOAB_PROCCOUNT} maker 2>&1 >maker_$(date +%Y-%m-%d_%H:%M:%S).out
mpirun -n ${MOAB_PROCCOUNT} maker

echo "### Cleaning up files ... removing unnecessary scratch files ..."
echo " "
# rm -fv 
sleep 3 # Sleep some time so potential stale nfs handles can disappear.

echo " "
echo "### Compressing results and copying back result archive ..."
echo " "
cd "${TMP_BASE_DIR}"
mkdir -vp "${MOAB_SUBMITDIR}" # if user has deleted or moved the submit dir
echo "Creating result tgz-file '${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz' ..."
tar -zcvf "${MOAB_SUBMITDIR}/${JOB_WORK_DIR}.tgz" "${JOB_WORK_DIR}" \
  || { echo "ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR '$TMP_WORK_DIR' on host '$HOSTNAME' manually (if not done automatically by queueing system)."; exit 102; }
# Remarks:
# * The resulting tgz file is copied back to the submit directory.
#   The name of the tgz file looks similar too
#          "bwunicluster-maker-example.moab.275.110528_101755.tgz"

echo " "
echo "### Final cleanup: Remove TMP_WORK_DIR ..."
echo " "
rm -rvf "${TMP_WORK_DIR}"
echo "END_TIME             = `date +'%y-%m-%d %H:%M:%S %s'`"


end=$(date +%s)
echo " "
echo "### Calculate duration ..."
echo " "
diff=$[end-start]
if [ $diff -lt 60 ]; then
   echo "Runtime (approx.): '$diff' secs"
elif [ $diff -ge 60 ]; then
   echo 'Runtime (approx.): '$[$diff / 60] 'min(s) '$[$diff % 60] 'secs'
fi
-------------- next part --------------
 
### Setting up shell environment ...
 
 
### Printing basic job infos to stdout ...
 
START_TIME           = 17-03-24 04:35:21 1490326521
HOSTNAME             = uc1n385
USER                 = kn_pop235844
MOAB_JOBNAME         = maker-job
MOAB_JOBID           = 11658541
MOAB_SUBMITDIR       = /pfs/work2/workspace/scratch/kn_pop235844-wstest-0
MOAB_NODECOUNT       = 2
MOAB_PROCCOUNT       = 16
SLURM_NODELIST       = uc1n[385,397]
PBS_NODEFILE         = 
 
### Creating TMP_WORK_DIR directory and changing to it ...
 
JOB_WORK_DIR      = maker-job.uc1.11658541.170324_043521
TMP_BASE_DIR      = /tmp/kn_pop235844
TMP_WORK_DIR cd   = /tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521
mkdir: created directory '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521'
/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521
 
### Loading MAKER module:
 
MAKER_VERSION       = 2.31.9
Currently Loaded Modulefiles:
  1) compiler/intel/16.0(default)
  2) mpi/impi/5.1.3-intel-16.0(default)
  3) bio/maker/2.31.9
 
### Copying input examples files for job:
 
'/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_contig.fasta' -> './dpp_contig.fasta'
'/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_est.fasta' -> './dpp_est.fasta'
'/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/dpp_protein.fasta' -> './dpp_protein.fasta'
'/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_bopts.ctl' -> './maker_bopts.ctl'
'/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_exe.ctl' -> './maker_exe.ctl'
'/opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples/maker_opts.ctl' -> './maker_opts.ctl'
 
### Display internal Maker/bwHPC environments...
 
MAKER_BIN_DIR  = /opt/bwhpc/common/bio/maker/2.31.9/bin
MAKER_EXA_DIR  = /opt/bwhpc/common/bio/maker/2.31.9/bwhpc-examples

 
### Runing Maker example
 
MOAB_PROCCOUNT: 16
slurmstepd: error: couldn't chdir to `/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521': No such file or directory: going to /tmp instead
STATUS: Parsing control files...
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
Argument "ALRM" isn't numeric in exit at /pfs/data1/software_uc1/bwhpc/common/bio/maker/2.31.9/bin/../perl/lib/forks.pm line 2184.
### Cleaning up files ... removing unnecessary scratch files ...
 
 
### Compressing results and copying back result archive ...
 
Creating result tgz-file '/pfs/work2/workspace/scratch/kn_pop235844-wstest-0/maker-job.uc1.11658541.170324_043521.tgz' ...
maker-job.uc1.11658541.170324_043521/
maker-job.uc1.11658541.170324_043521/dpp_contig.fasta
maker-job.uc1.11658541.170324_043521/dpp_est.fasta
maker-job.uc1.11658541.170324_043521/dpp_protein.fasta
maker-job.uc1.11658541.170324_043521/maker_bopts.ctl
maker-job.uc1.11658541.170324_043521/maker_exe.ctl
maker-job.uc1.11658541.170324_043521/maker_opts.ctl
maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/
maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock
maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log
maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log
maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log
 
### Final cleanup: Remove TMP_WORK_DIR ...
 
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.fasta'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_est.fasta'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_protein.fasta'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_bopts.ctl'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_exe.ctl'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/maker_opts.ctl'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/.NFSLock.gi_lock.NFSLock'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_opts.log'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_bopts.log'
removed '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output/maker_exe.log'
removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521/dpp_contig.maker.output'
removed directory: '/tmp/kn_pop235844/maker-job.uc1.11658541.170324_043521'
END_TIME             = 17-03-24 04:36:08 1490326568
 
### Calculate duration ...
 
Runtime (approx.): '47' secs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5055 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170324/5ed3caae/attachment-0003.p7s>


More information about the maker-devel mailing list