[maker-devel] Processing contig output - MPI job fail

Lahcen Campbell lcampbell at ebi.ac.uk
Wed Aug 30 03:41:48 MDT 2017


Hello folks,

Can anyone inform me on the ability of MAKER to restart from a 
checkpoint following annotation processing has compelted.

I had an MPI MAKER job running successfully for 6 weeks for a de novo 
fly genome I am working on. It was running with mpich3-3.1-icc on LSF 
batch system using 96 cpu's and 140Gb RAM. MAKER had processed 91% of 
the overall assembly length of my genome under MAKER_Finished contigs. 
Numbers of "Finished" contigs hadn't changed for ~10 days when it died, 
as I assume MAKER was collecting annotated gene stats, collecting contig 
statistics and clustering of transcripts into fasta files etc: (As follows)

/............/

/clustering transcripts into genes for annotations//
//Processing transcripts into genes//
//adding statistics to annotations//
//Calculating annotation quality statistics//
//choosing best annotation set//
//Choosing best annotations//
//processing chunk output//
//processing contig output/

However, this job exited after processing 26,767 of the 42,207 "MAKER 
Finished" contigs. The job died with a 255 exit code, which I suspect 
means somoene in our systems team may have killed the job to maintain 
system stability or someting.

*_The following error output was captured_*/*_:_*
/

Calculating annotation quality statistics
choosing best annotation set
Choosing best annotations
processing chunk output
processing contig output

[proxy:0:0 at loom7.ebi.ac.uk] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0 at loom7.ebi.ac.uk] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at loom7.ebi.ac.uk] main (./pm/pmiserv/pmip.c:206): demux engine 
error waiting for event
[proxy:0:1 at loom15] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5 at loom14] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5 at loom14] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at loom6.ebi.ac.uk] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:4 at loom6.ebi.ac.uk] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at loom6.ebi.ac.uk] main (./pm/pmiserv/pmip.c:206): demux engine 
error waiting for event
[proxy:0:5 at loom14] main (./pm/pmiserv/pmip.c:206): demux engine error 
waiting for event
[proxy:0:1 at loom15] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at loom15] main (./pm/pmiserv/pmip.c:206): demux engine error 
waiting for event
[mpiexec at loom7.ebi.ac.uk] HYDT_bscu_wait_for_completion 
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes 
terminated badly; aborting
[mpiexec at loom7.ebi.ac.uk] HYDT_bsci_wait_for_completion 
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting 
for completion
[mpiexec at loom7.ebi.ac.uk] HYD_pmci_wait_for_completion 
(./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for 
completion
[mpiexec at loom7.ebi.ac.uk] main (./ui/mpich/mpiexec.c:331): process 
manager error waiting for completion

Unfortunately since this process died, I have been unable to get the job 
reschduled again on our system due to resource limitations and job 
queing. But can anyone tell me, will MAKER be able to finish processing 
contig stats and information to completion, following this early exit ? 
I really can't afford another 6 weeks of computation so Im worried as 
you might expect. Would you recommend I submit this MAKER job again to 
finalize contig information/produce fasta files etc with the same amount 
of resources, or might I be able to request less resources without too 
much of a penalty in terms of compute time.

Any hints or insight on this would be greatly appreciated.

Thank you in advance,

Lahcen

EBI-Hinxton, UK.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170830/8a2a2bc2/attachment-0002.html>


More information about the maker-devel mailing list