[maker-devel] MPI MAKER hanging NFS

Heywood, Todd heywood at cshl.edu
Mon May 20 18:33:32 MDT 2013


All starter_with_limit.sh does is set a ulimit for the top process for the job, then start it passing all parameters:

#!/bin/sh
ulimit -c 0
exec $*


From: Evan Ernst <eernst at cshl.edu<mailto:eernst at cshl.edu>>
Date: Monday, May 20, 2013 8:20 PM
To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
Cc: Carson Holt <Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>, "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, Todd Heywood <heywood at cshl.edu<mailto:heywood at cshl.edu>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory
/opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory


Todd, are these errors from the starter_with_limit.sh wrapper harmless?

Thanks,
Evan


On Mon, May 20, 2013 at 7:50 PM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
Could you run the following command for me and share the ouptut with me?

mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'

Thanks,
Carson



From: Evan Ernst <eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu>>>
Date: Monday, 20 May, 2013 4:36 PM
To: Carson Holt <carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca><mailto:carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca>>>
Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>" <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>, "Heywood, Todd" <heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

Hi Carson,

The SGE launch script looks like this (sans SGE args):

mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1

Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers.

Despite this, the mpi*** files are still being created in the working directory.

If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case.

Thanks,
Evan




On Fri, May 17, 2013 at 9:40 AM, Carson Holt <Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>> wrote:
I'm glad your getting better results.

With respect to environmental variables.  One common error in MPI
execution is that the environment variables will not always be the same on
the other nodes since only the root node is attached to a terminal, so
variables in launch scripts (.bashrc etc.) may not be available on all
nodes.  Many clusters that are part of the XSEDE network and use SGE for
example have scripts that wrap mpiexec to guarantee export of all
environmental variables when using MPI to avoid just this type of common
error. So like anything, you start with the most common cause of errors
and then work to the less common.  Kernel bugs usually rank low on the
list :-) But I'm glad it's working for you now.

Thanks,
Carson





On 13-05-17 9:25 AM, "Heywood, Todd" <heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>>> wrote:

>It appears that a kernel bug caused the NFS hang, at least for limlted
>scale testing (6 nodes, 192 tasks). I upgraded the kernel from
>2.6.32-279.9.1.el6.x86_64  to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and
>cannot reproduce the hangs.
>
>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the
>TMPDIR we are referring to is set by SGE within a job to be
>/tmp/uge/JobID.TaskID.QueueName.  Have you run via SGE?
>
>Todd
>
>
>
>
>From: Carson Holt <Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>>>
>Date: Wednesday, May 15, 2013 1:15 PM
>To: "Ernst, Evan" <eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu>><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu>>>>
>Cc: Todd Heywood <heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>>>>,
>"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>"
><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>>
>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
>The mpi**** files should be generated in the $TMPDIR or TMP= location.
>If they are happening in the working directory, then there is a problem.
>If you are not setting TMP=, perhaps TMPDIR is not being exported when
>'mpiexec' is launched.  You may have to manually specify that it needs to
>be exported to the other nodes using the mpiexec command line flags.
>OpenMPI for example does not export all environmental variables by
>default to the other nodes.
>
>Thanks,
>Carson
>
>
>
>From: Evan Ernst <eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu>><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu>>>>
>Date: Wednesday, 15 May, 2013 1:08 PM
>To: Carson Holt <carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca><mailto:carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca>><mailto:carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca><mailto:carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca>>>>
>Cc: "Heywood, Todd" <heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>>>>,
>"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>"
><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>>
>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
>Hi Carson,
>
>For these runs, -TMP is set to the $TMPDIR environment variable via maker
>command line argument in the cluster job script to use the local disk on
>each node. We can see files being generated in those locations on each
>node, so it seems this is working as expected.
>
>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is
>relevant, but I'm also setting mpi_blastdb= to consolidate the databases
>onto a different, faster nfs mount than the working dir where the mpi****
>files are being written.
>
>Thanks,
>Evan
>
>
>
>On Tue, May 14, 2013 at 9:01 PM, Carson Holt
><Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>>> wrote:
>No it does not use ROMIO.
>
>The locking may be do to how your NFS is implemented.  MAKER does a lot of
>small writes.  Some NFS implementations do not handle that well and only
>like large infrequent writes and frequent reads?
>MAKER also uses a variant of the File:::NFSLock module which uses
>hardlinks to force a flush of the NFS IO cache when asyncrynous IO is
>enabled (described here
>http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html).
>I know that the FhGFS implementation of NFS has broken hard link
>functionality.
>
>
>Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS
>mounted location.  It must be local (/tmp for example).  This is because
>certain types of operations are not always NFS safe and need a local
>location to work with (anything involving berkley DB or SQLite for
>example).  Make sure you are not setting that to an NFS mounted scratch
>location.  The mpi**** files, are examples of some short lived files that
>should not be in NFS.  They hold chunks of data from threads that are
>processing the genome and are very rapidly created and deleted.  They will
>be cleaned up automatically when maker finished or killed by standard
>signals such as when you hit ^C or use kill 15.
>
>
>Thanks,
>Carson
>
>
>
>
>On 13-05-14 4:42 PM, "Heywood, Todd"
><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu>>>> wrote:
>
>>We have been getting hung NFS mounts on some nodes when running MPI MAKER
>>(version 2.27). Processes go into a "D" state and cannot be killed. We
>>end up having to reboot nodes to recover them. We are running MPICH2
>>version 1.4.1p1
>>with RHEL 6.3. Questions:
>>
>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>>on a sync_page system call under NFS. That *might* imply some locking
>>issues.
>>
>>(2) Has anyone else seen this?
>>
>>(3) The root directory (parent of genome.maker.output directory) has lots
>>of mpi***** files, all of which have the first line
>>"pst0Process::MpiChunk". Is this expected?
>>
>>I'm able to reproducibly hang NFS on some nodes when using at least 4
>>32-core nodes and 128 running MPI tasks.
>>
>>Thanks,
>>
>>Todd Heywood
>>CSHL
>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com><mailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>><mailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com><mailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>>>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


_______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com><mailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org





More information about the maker-devel mailing list