[maker-devel] Processing contig output - MPI job fail
Lahcen Campbell
lcampbell at ebi.ac.uk
Wed Aug 30 03:41:48 MDT 2017
Hello folks,
Can anyone inform me on the ability of MAKER to restart from a
checkpoint following annotation processing has compelted.
I had an MPI MAKER job running successfully for 6 weeks for a de novo
fly genome I am working on. It was running with mpich3-3.1-icc on LSF
batch system using 96 cpu's and 140Gb RAM. MAKER had processed 91% of
the overall assembly length of my genome under MAKER_Finished contigs.
Numbers of "Finished" contigs hadn't changed for ~10 days when it died,
as I assume MAKER was collecting annotated gene stats, collecting contig
statistics and clustering of transcripts into fasta files etc: (As follows)
/............/
/clustering transcripts into genes for annotations//
//Processing transcripts into genes//
//adding statistics to annotations//
//Calculating annotation quality statistics//
//choosing best annotation set//
//Choosing best annotations//
//processing chunk output//
//processing contig output/
However, this job exited after processing 26,767 of the 42,207 "MAKER
Finished" contigs. The job died with a 255 exit code, which I suspect
means somoene in our systems team may have killed the job to maintain
system stability or someting.
*_The following error output was captured_*/*_:_*
/
Calculating annotation quality statistics
choosing best annotation set
Choosing best annotations
processing chunk output
processing contig output
[proxy:0:0 at loom7.ebi.ac.uk] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0 at loom7.ebi.ac.uk] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at loom7.ebi.ac.uk] main (./pm/pmiserv/pmip.c:206): demux engine
error waiting for event
[proxy:0:1 at loom15] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5 at loom14] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5 at loom14] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at loom6.ebi.ac.uk] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:4 at loom6.ebi.ac.uk] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at loom6.ebi.ac.uk] main (./pm/pmiserv/pmip.c:206): demux engine
error waiting for event
[proxy:0:5 at loom14] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[proxy:0:1 at loom15] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at loom15] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at loom7.ebi.ac.uk] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
terminated badly; aborting
[mpiexec at loom7.ebi.ac.uk] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
[mpiexec at loom7.ebi.ac.uk] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
completion
[mpiexec at loom7.ebi.ac.uk] main (./ui/mpich/mpiexec.c:331): process
manager error waiting for completion
Unfortunately since this process died, I have been unable to get the job
reschduled again on our system due to resource limitations and job
queing. But can anyone tell me, will MAKER be able to finish processing
contig stats and information to completion, following this early exit ?
I really can't afford another 6 weeks of computation so Im worried as
you might expect. Would you recommend I submit this MAKER job again to
finalize contig information/produce fasta files etc with the same amount
of resources, or might I be able to request less resources without too
much of a penalty in terms of compute time.
Any hints or insight on this would be greatly appreciated.
Thank you in advance,
Lahcen
EBI-Hinxton, UK.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170830/8a2a2bc2/attachment-0002.html>
More information about the maker-devel
mailing list