[maker-devel] MPI MAKER hanging NFS

Carson Holt carsonhh at gmail.com
Mon May 20 18:38:41 MDT 2013


It may have just been a random failure.  Try launching it again.
Basically one instance failed to launch hydra_pmi_proxy which wraps the
command being called via mpiexec.  So you get 7 lines of output instead of
the 8 that should be there.

--Carson


On 13-05-20 8:33 PM, "Heywood, Todd" <heywood at cshl.edu> wrote:

>All starter_with_limit.sh does is set a ulimit for the top process for
>the job, then start it passing all parameters:
>
>#!/bin/sh
>ulimit -c 0
>exec $*
>
>
>From: Evan Ernst <eernst at cshl.edu<mailto:eernst at cshl.edu>>
>Date: Monday, May 20, 2013 8:20 PM
>To: Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>>
>Cc: Carson Holt <Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>,
>"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>"
><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>, Todd
>Heywood <heywood at cshl.edu<mailto:heywood at cshl.edu>>
>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
>/tmp/uge/1031236.1.primary.q
>/tmp/uge/1031236.1.primary.q
>/tmp/uge/1031236.1.primary.q
>/tmp/uge/1031236.1.primary.q
>/tmp/uge/1031236.1.primary.q
>/tmp/uge/1031236.1.primary.q
>/tmp/uge/1031236.1.primary.q
>/opt/uge/default/common/starter_with_limit.sh: line 4:
>/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin
>/hydra_pmi_proxy": No such file or directory
>/opt/uge/default/common/starter_with_limit.sh: line 4: exec:
>/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin
>/hydra_pmi_proxy": cannot execute: No such file or directory
>
>
>Todd, are these errors from the starter_with_limit.sh wrapper harmless?
>
>Thanks,
>Evan
>
>
>On Mon, May 20, 2013 at 7:50 PM, Carson Holt
><carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:
>Could you run the following command for me and share the ouptut with me?
>
>mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'
>
>Thanks,
>Carson
>
>
>
>From: Evan Ernst 
><eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:eer
>nst at cshl.edu>>>
>Date: Monday, 20 May, 2013 4:36 PM
>To: Carson Holt 
><carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca><mailto:carson.holt@
>oicr.on.ca<mailto:carson.holt at oicr.on.ca>>>
>Cc: 
>"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:ma
>ker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>"
><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:ma
>ker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>,
>"Heywood, Todd" 
><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:
>heywood at cshl.edu>>>
>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
>Hi Carson,
>
>The SGE launch script looks like this (sans SGE args):
>
>mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl
>maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1
>
>Snooping on the running jobs (see attached image), it looks like $TMPDIR
>is evaluated to a local directory by the shell of the MPI master node as
>intended, so the evaluated path, not the env var reference, is being
>passed to the MPI workers.
>
>Despite this, the mpi*** files are still being created in the working
>directory.
>
>If I understand correctly, these mpi*** files are meant to be written to
>the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg),
>which should be equivalent, but this doesn't seem to be the case.
>
>Thanks,
>Evan
>
>
>
>
>On Fri, May 17, 2013 at 9:40 AM, Carson Holt
><Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt@
>oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>> wrote:
>I'm glad your getting better results.
>
>With respect to environmental variables.  One common error in MPI
>execution is that the environment variables will not always be the same on
>the other nodes since only the root node is attached to a terminal, so
>variables in launch scripts (.bashrc etc.) may not be available on all
>nodes.  Many clusters that are part of the XSEDE network and use SGE for
>example have scripts that wrap mpiexec to guarantee export of all
>environmental variables when using MPI to avoid just this type of common
>error. So like anything, you start with the most common cause of errors
>and then work to the less common.  Kernel bugs usually rank low on the
>list :-) But I'm glad it's working for you now.
>
>Thanks,
>Carson
>
>
>
>
>
>On 13-05-17 9:25 AM, "Heywood, Todd"
><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto:
>heywood at cshl.edu>>> wrote:
>
>>It appears that a kernel bug caused the NFS hang, at least for limlted
>>scale testing (6 nodes, 192 tasks). I upgraded the kernel from
>>2.6.32-279.9.1.el6.x86_64  to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and
>>cannot reproduce the hangs.
>>
>>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the
>>TMPDIR we are referring to is set by SGE within a job to be
>>/tmp/uge/JobID.TaskID.QueueName.  Have you run via SGE?
>>
>>Todd
>>
>>
>>
>>
>>From: Carson Holt
>><Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt
>>@oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>><mailto:Carson.Holt at oicr.on.ca
>><mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Cars
>>on.Holt at oicr.on.ca>>>>
>>Date: Wednesday, May 15, 2013 1:15 PM
>>To: "Ernst, Evan"
>><eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:ee
>>rnst at cshl.edu>><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eer
>>nst at cshl.edu<mailto:eernst at cshl.edu>>>>
>>Cc: Todd Heywood 
>><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto
>>:heywood at cshl.edu>><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu><mail
>>to:heywood at cshl.edu<mailto:heywood at cshl.edu>>>>,
>>"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:m
>>aker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:ma
>>ker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:make
>>r-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>"
>><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:m
>>aker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:ma
>>ker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:make
>>r-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>>
>>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>>
>>The mpi**** files should be generated in the $TMPDIR or TMP= location.
>>If they are happening in the working directory, then there is a problem.
>>If you are not setting TMP=, perhaps TMPDIR is not being exported when
>>'mpiexec' is launched.  You may have to manually specify that it needs to
>>be exported to the other nodes using the mpiexec command line flags.
>>OpenMPI for example does not export all environmental variables by
>>default to the other nodes.
>>
>>Thanks,
>>Carson
>>
>>
>>
>>From: Evan Ernst 
>><eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eernst at cshl.edu<mailto:ee
>>rnst at cshl.edu>><mailto:eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:eer
>>nst at cshl.edu<mailto:eernst at cshl.edu>>>>
>>Date: Wednesday, 15 May, 2013 1:08 PM
>>To: Carson Holt 
>><carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca><mailto:carson.holt
>>@oicr.on.ca<mailto:carson.holt at oicr.on.ca>><mailto:carson.holt at oicr.on.ca
>><mailto:carson.holt at oicr.on.ca><mailto:carson.holt at oicr.on.ca<mailto:cars
>>on.holt at oicr.on.ca>>>>
>>Cc: "Heywood, Todd"
>><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto
>>:heywood at cshl.edu>><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu><mail
>>to:heywood at cshl.edu<mailto:heywood at cshl.edu>>>>,
>>"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:m
>>aker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:ma
>>ker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:make
>>r-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>"
>><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:m
>>aker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>><mailto:ma
>>ker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:make
>>r-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>>
>>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>>
>>Hi Carson,
>>
>>For these runs, -TMP is set to the $TMPDIR environment variable via maker
>>command line argument in the cluster job script to use the local disk on
>>each node. We can see files being generated in those locations on each
>>node, so it seems this is working as expected.
>>
>>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is
>>relevant, but I'm also setting mpi_blastdb= to consolidate the databases
>>onto a different, faster nfs mount than the working dir where the mpi****
>>files are being written.
>>
>>Thanks,
>>Evan
>>
>>
>>
>>On Tue, May 14, 2013 at 9:01 PM, Carson Holt
>><Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt
>>@oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>><mailto:Carson.Holt at oicr.on.ca
>><mailto:Carson.Holt at oicr.on.ca><mailto:Carson.Holt at oicr.on.ca<mailto:Cars
>>on.Holt at oicr.on.ca>>>> wrote:
>>No it does not use ROMIO.
>>
>>The locking may be do to how your NFS is implemented.  MAKER does a lot
>>of
>>small writes.  Some NFS implementations do not handle that well and only
>>like large infrequent writes and frequent reads?
>>MAKER also uses a variant of the File:::NFSLock module which uses
>>hardlinks to force a flush of the NFS IO cache when asyncrynous IO is
>>enabled (described here
>>http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html).
>>I know that the FhGFS implementation of NFS has broken hard link
>>functionality.
>>
>>
>>Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS
>>mounted location.  It must be local (/tmp for example).  This is because
>>certain types of operations are not always NFS safe and need a local
>>location to work with (anything involving berkley DB or SQLite for
>>example).  Make sure you are not setting that to an NFS mounted scratch
>>location.  The mpi**** files, are examples of some short lived files that
>>should not be in NFS.  They hold chunks of data from threads that are
>>processing the genome and are very rapidly created and deleted.  They
>>will
>>be cleaned up automatically when maker finished or killed by standard
>>signals such as when you hit ^C or use kill 15.
>>
>>
>>Thanks,
>>Carson
>>
>>
>>
>>
>>On 13-05-14 4:42 PM, "Heywood, Todd"
>><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu<mailto
>>:heywood at cshl.edu>><mailto:heywood at cshl.edu<mailto:heywood at cshl.edu><mail
>>to:heywood at cshl.edu<mailto:heywood at cshl.edu>>>> wrote:
>>
>>>We have been getting hung NFS mounts on some nodes when running MPI
>>>MAKER
>>>(version 2.27). Processes go into a "D" state and cannot be killed. We
>>>end up having to reboot nodes to recover them. We are running MPICH2
>>>version 1.4.1p1
>>>with RHEL 6.3. Questions:
>>>
>>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>>>on a sync_page system call under NFS. That *might* imply some locking
>>>issues.
>>>
>>>(2) Has anyone else seen this?
>>>
>>>(3) The root directory (parent of genome.maker.output directory) has
>>>lots
>>>of mpi***** files, all of which have the first line
>>>"pst0Process::MpiChunk". Is this expected?
>>>
>>>I'm able to reproducibly hang NFS on some nodes when using at least 4
>>>32-core nodes and 128 running MPI tasks.
>>>
>>>Thanks,
>>>
>>>Todd Heywood
>>>CSHL
>>>
>>>
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com><m
>>ailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.
>>com>><mailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bl
>>uehost.com><mailto:maker-devel at box290.bluehost.com<mailto:maker-devel at box
>>290.bluehost.com>>>
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>
>
>_______________________________________________ maker-devel mailing list
>maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com><ma
>ilto:maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.co
>m>> 
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>






More information about the maker-devel mailing list