[maker-devel] MPI MAKER hanging NFS

Heywood, Todd heywood at cshl.edu
Tue May 14 14:42:33 MDT 2013


We have been getting hung NFS mounts on some nodes when running MPI MAKER (version 2.27). Processes go into a "D" state and cannot be killed. We end up having to reboot nodes to recover them. We are running MPICH2 version 1.4.1p1
with RHEL 6.3. Questions:

(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung on a sync_page system call under NFS. That *might* imply some locking issues.

(2) Has anyone else seen this?

(3) The root directory (parent of genome.maker.output directory) has lots of mpi***** files, all of which have the first line "pst0Process::MpiChunk". Is this expected?

I'm able to reproducibly hang NFS on some nodes when using at least 4 32-core nodes and 128 running MPI tasks.

Thanks,

Todd Heywood
CSHL





More information about the maker-devel mailing list