[maker-devel] NFSLock problem

Fields, Christopher J cjfields at illinois.edu
Thu Oct 3 20:45:29 MDT 2013


Carson,

Took a couple of restarts but finally processed through.  I saw a prior email on the mail list where you mentioned this could be due to failed jobs hanging things up; I may set up my next job to allow one processor for logging into the nodes in case there is a similar problem, maybe try to track this down.

chris

On Oct 3, 2013, at 1:45 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Stop the job and restart it.  No need to delete anything.  Just restart it.
> 
> Thanks,
> Carson
> 
> 
> On 10/3/13 2:44 PM, "Fields, Christopher J" <cjfields at illinois.edu> wrote:
> 
>> I have a MAKER job running that seems to be stalled on a failed scaffold.
>> It's running via MPI (MAKER v2.28, openMPI 1.6.3), that appears to have
>> worked successfully for the most part.  This is a run that only uses
>> transcriptome and protein information in order to get a decent dseat of
>> 
>> The failed scaffold seems to be holding the job from completion.  There
>> does seem to be changes, but mainly they are on the NFSLock files:
>> 
>> ./.NFSLock.gi_lock.NFSLock
>> ./Zalbi.unplaced.scaf_datastore/2C/0F/KB913038.1/theVoid.KB913038.1
>> ./Zalbi.unplaced.scaf_datastore/2C/0F/KB913038.1/theVoid.KB913038.1/.NFSLo
>> ck.KB913038%2E1.281.282.junction.blastn.holdover.NFSLock
>> ./Zalbi.unplaced.scaf_datastore/2C/0F/KB913038.1/theVoid.KB913038.1/.NFSLo
>> ck.KB913038%2E1.282.start.blastn.holdover.NFSLock
>> ./Zalbi.unplaced.scaf_datastore/2C/0F/KB913038.1/theVoid.KB913038.1/.NFSLo
>> ck.KB913038%2E1.281.end.blastn.holdover.NFSLock
>> ./Zalbi.unplaced.scaf_datastore/2C/0F/KB913038.1/theVoid.KB913038.1/.NFSLo
>> ck.KB913038%2E1.282.start.blastn.holdover.NFSLock.STACK
>> ./Zalbi.unplaced.scaf_datastore/2C/0F/KB913038.1/theVoid.KB913038.1/.NFSLo
>> ck.KB913038%2E1.281.end.blastn.holdover.NFSLock.STACK
>> 
>> Everything else seems to have completed.
>> 
>> I have seen a few issues re: NFS locking problems, would this be related?
>> Should I stop the job?
>> 
>> We're running GPFS for our NFS.  Here's 'mount':
>> 
>> -system-specific-4.1$ mount
>> /dev/sda5 on / type ext4 (rw)
>> proc on /proc type proc (rw)
>> sysfs on /sys type sysfs (rw)
>> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
>> tmpfs on /dev/shm type tmpfs (rw)
>> /dev/sda1 on /boot type ext4 (rw)
>> /dev/sdb1 on /export type ext4 (rw)
>> /dev/sda2 on /var type ext4 (rw)
>> tmpfs on /var/lib/ganglia/rrds type tmpfs
>> (rw,size=6180842000,gid=99,uid=99)
>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
>> /dev/IGBHOME0 on /home type gpfs (rw,mtime,dev=IGBHOME0)
>> 128.174.124.79:/shares/group on /archive/group type nfs
>> (rw,sync,hard,intr,retrans=10,timeo=300,rsize=65536,wsize=1048576,vers=3,p
>> roto=tcp,mountproto=tcp,addr=128.174.124.79)
>> 128.174.124.79:/shares/CBC on /archive/CBC type nfs
>> (rw,sync,hard,intr,retrans=10,timeo=300,rsize=65536,wsize=1048576,vers=3,p
>> roto=tcp,mountproto=tcp,addr=128.174.124.79)
>> 
>> chris
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 





More information about the maker-devel mailing list