[maker-devel] MPI MAKER hanging NFS

Evan Ernst eernst at cshl.edu
Mon May 20 18:20:22 MDT 2013


/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/opt/uge/default/common/starter_with_limit.sh: line 4:
/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy":
No such file or directory
/opt/uge/default/common/starter_with_limit.sh: line 4: exec:
/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy":
cannot execute: No such file or directory


Todd, are these errors from the starter_with_limit.sh wrapper harmless?

Thanks,
Evan


On Mon, May 20, 2013 at 7:50 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Could you run the following command for me and share the ouptut with me?
>
> mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'
>
> Thanks,
> Carson
>
>
>
> From: Evan Ernst <eernst at cshl.edu<mailto:eernst at cshl.edu>>
> Date: Monday, 20 May, 2013 4:36 PM
> To: Carson Holt <carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca>>
> Cc: "maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>" <
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>,
> "Heywood, Todd" <heywood at cshl.edu<mailto:heywood at cshl.edu>>
> Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
> Hi Carson,
>
> The SGE launch script looks like this (sans SGE args):
>
> mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl
> maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1
>
> Snooping on the running jobs (see attached image), it looks like $TMPDIR
> is evaluated to a local directory by the shell of the MPI master node as
> intended, so the evaluated path, not the env var reference, is being passed
> to the MPI workers.
>
> Despite this, the mpi*** files are still being created in the working
> directory.
>
> If I understand correctly, these mpi*** files are meant to be written to
> the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg),
> which should be equivalent, but this doesn't seem to be the case.
>
> Thanks,
> Evan
>
>
>
>
> On Fri, May 17, 2013 at 9:40 AM, Carson Holt <Carson.Holt at oicr.on.ca
> <mailto:Carson.Holt at oicr.on.ca>> wrote:
> I'm glad your getting better results.
>
> With respect to environmental variables.  One common error in MPI
> execution is that the environment variables will not always be the same on
> the other nodes since only the root node is attached to a terminal, so
> variables in launch scripts (.bashrc etc.) may not be available on all
> nodes.  Many clusters that are part of the XSEDE network and use SGE for
> example have scripts that wrap mpiexec to guarantee export of all
> environmental variables when using MPI to avoid just this type of common
> error. So like anything, you start with the most common cause of errors
> and then work to the less common.  Kernel bugs usually rank low on the
> list :-) But I'm glad it's working for you now.
>
> Thanks,
> Carson
>
>
>
>
>
> On 13-05-17 9:25 AM, "Heywood, Todd" <heywood at cshl.edu<mailto:
> heywood at cshl.edu>> wrote:
>
> >It appears that a kernel bug caused the NFS hang, at least for limlted
> >scale testing (6 nodes, 192 tasks). I upgraded the kernel from
> >2.6.32-279.9.1.el6.x86_64  to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and
> >cannot reproduce the hangs.
> >
> >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the
> >TMPDIR we are referring to is set by SGE within a job to be
> >/tmp/uge/JobID.TaskID.QueueName.  Have you run via SGE?
> >
> >Todd
> >
> >
> >
> >
> >From: Carson Holt <Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca
> ><mailto:Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>>
> >Date: Wednesday, May 15, 2013 1:15 PM
> >To: "Ernst, Evan" <eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:
> eernst at cshl.edu<mailto:eernst at cshl.edu>>>
> >Cc: Todd Heywood <heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:
> heywood at cshl.edu<mailto:heywood at cshl.edu>>>,
> >"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>"
> ><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>
> >Subject: Re: [maker-devel] MPI MAKER hanging NFS
> >
> >The mpi**** files should be generated in the $TMPDIR or TMP= location.
> >If they are happening in the working directory, then there is a problem.
> >If you are not setting TMP=, perhaps TMPDIR is not being exported when
> >'mpiexec' is launched.  You may have to manually specify that it needs to
> >be exported to the other nodes using the mpiexec command line flags.
> >OpenMPI for example does not export all environmental variables by
> >default to the other nodes.
> >
> >Thanks,
> >Carson
> >
> >
> >
> >From: Evan Ernst <eernst at cshl.edu<mailto:eernst at cshl.edu><mailto:
> eernst at cshl.edu<mailto:eernst at cshl.edu>>>
> >Date: Wednesday, 15 May, 2013 1:08 PM
> >To: Carson Holt <carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca
> ><mailto:carson.holt at oicr.on.ca<mailto:carson.holt at oicr.on.ca>>>
> >Cc: "Heywood, Todd" <heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:
> heywood at cshl.edu<mailto:heywood at cshl.edu>>>,
> >"maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>"
> ><maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org><mailto:
> maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>>
> >Subject: Re: [maker-devel] MPI MAKER hanging NFS
> >
> >Hi Carson,
> >
> >For these runs, -TMP is set to the $TMPDIR environment variable via maker
> >command line argument in the cluster job script to use the local disk on
> >each node. We can see files being generated in those locations on each
> >node, so it seems this is working as expected.
> >
> >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is
> >relevant, but I'm also setting mpi_blastdb= to consolidate the databases
> >onto a different, faster nfs mount than the working dir where the mpi****
> >files are being written.
> >
> >Thanks,
> >Evan
> >
> >
> >
> >On Tue, May 14, 2013 at 9:01 PM, Carson Holt
> ><Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca><mailto:
> Carson.Holt at oicr.on.ca<mailto:Carson.Holt at oicr.on.ca>>> wrote:
> >No it does not use ROMIO.
> >
> >The locking may be do to how your NFS is implemented.  MAKER does a lot of
> >small writes.  Some NFS implementations do not handle that well and only
> >like large infrequent writes and frequent reads?
> >MAKER also uses a variant of the File:::NFSLock module which uses
> >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is
> >enabled (described here
> >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html).
> >I know that the FhGFS implementation of NFS has broken hard link
> >functionality.
> >
> >
> >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS
> >mounted location.  It must be local (/tmp for example).  This is because
> >certain types of operations are not always NFS safe and need a local
> >location to work with (anything involving berkley DB or SQLite for
> >example).  Make sure you are not setting that to an NFS mounted scratch
> >location.  The mpi**** files, are examples of some short lived files that
> >should not be in NFS.  They hold chunks of data from threads that are
> >processing the genome and are very rapidly created and deleted.  They will
> >be cleaned up automatically when maker finished or killed by standard
> >signals such as when you hit ^C or use kill 15.
> >
> >
> >Thanks,
> >Carson
> >
> >
> >
> >
> >On 13-05-14 4:42 PM, "Heywood, Todd"
> ><heywood at cshl.edu<mailto:heywood at cshl.edu><mailto:heywood at cshl.edu
> <mailto:heywood at cshl.edu>>> wrote:
> >
> >>We have been getting hung NFS mounts on some nodes when running MPI MAKER
> >>(version 2.27). Processes go into a "D" state and cannot be killed. We
> >>end up having to reboot nodes to recover them. We are running MPICH2
> >>version 1.4.1p1
> >>with RHEL 6.3. Questions:
> >>
> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
> >>on a sync_page system call under NFS. That *might* imply some locking
> >>issues.
> >>
> >>(2) Has anyone else seen this?
> >>
> >>(3) The root directory (parent of genome.maker.output directory) has lots
> >>of mpi***** files, all of which have the first line
> >>"pst0Process::MpiChunk". Is this expected?
> >>
> >>I'm able to reproducibly hang NFS on some nodes when using at least 4
> >>32-core nodes and 128 running MPI tasks.
> >>
> >>Thanks,
> >>
> >>Todd Heywood
> >>CSHL
> >>
> >>
> >
> >
> >_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com
> ><mailto:maker-devel at box290.bluehost.com<mailto:
> maker-devel at box290.bluehost.com>>
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> >
>
>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130520/6c64f8e2/attachment-0003.html>


More information about the maker-devel mailing list