[maker-devel] Processing contig output - MPI job fail

lahcen campbell lahcencampbell at gmail.com
Wed Aug 30 03:44:03 MDT 2017


*REPOSTED UNDER CORRECTED SUBSCRIBED MEMBER EMAIL ACCOUNT.*

Hello folks,

Can anyone inform me on the ability of MAKER to restart from a checkpoint
following annotation processing has compelted.

I had an MPI MAKER job running successfully for 6 weeks for a de novo fly
genome I am working on. It was running with mpich3-3.1-icc on LSF batch
system using 96 cpu's and 140Gb RAM. MAKER had processed 91% of the overall
assembly length of my genome under MAKER_Finished contigs. Numbers of
"Finished" contigs hadn't changed for ~10 days when it died, as I assume
MAKER was collecting annotated gene stats, collecting contig statistics and
clustering of transcripts into fasta files etc: (As follows)

*............*

*clustering transcripts into genes for annotations*
*Processing transcripts into genes*
*adding statistics to annotations*
*Calculating annotation quality statistics*
*choosing best annotation set*
*Choosing best annotations*
*processing chunk output*
*processing contig output*

However, this job exited after processing 26,767 of the 42,207 "MAKER
Finished" contigs. The job died with a 255 exit code, which I suspect means
somoene in our systems team may have killed the job to maintain system
stability or someting.

*The following error output was captured*
*: *

Calculating annotation quality statistics
choosing best annotation set
Choosing best annotations
processing chunk output
processing contig output

[proxy:0:0 at loom7.ebi.ac.uk] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0 at loom7.ebi.ac.uk] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at loom7.ebi.ac.uk] main (./pm/pmiserv/pmip.c:206): demux engine
error waiting for event
[proxy:0:1 at loom15] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5 at loom14] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:5 at loom14] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at loom6.ebi.ac.uk] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:4 at loom6.ebi.ac.uk] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:4 at loom6.ebi.ac.uk] main (./pm/pmiserv/pmip.c:206): demux engine
error waiting for event
[proxy:0:5 at loom14] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[proxy:0:1 at loom15] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1 at loom15] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at loom7.ebi.ac.uk] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at loom7.ebi.ac.uk] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
completion
[mpiexec at loom7.ebi.ac.uk] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
completion
[mpiexec at loom7.ebi.ac.uk] main (./ui/mpich/mpiexec.c:331): process manager
error waiting for completion

Unfortunately since this process died, I have been unable to get the job
reschduled again on our system due to resource limitations and job queing.
But can anyone tell me, will MAKER be able to finish processing contig
stats and information to completion, following this early exit ? I really
can't afford another 6 weeks of computation so Im worried as you might
expect. Would you recommend I submit this MAKER job again to finalize
contig information/produce fasta files etc with the same amount of
resources, or might I be able to request less resources without too much of
a penalty in terms of compute time.

Any hints or insight on this would be greatly appreciated.

Thank you in advance,

Lahcen

EBI-Hinxton, UK.

-- 
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: lahcencampbell at gmail.com                        <
> https://www.ebi.ac.uk/about/people/lahcen-campbell <
==========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170830/6b888f13/attachment-0003.html>


More information about the maker-devel mailing list