[maker-devel] Processing contig output - MPI job fail
Carson Holt
carsonhh at gmail.com
Wed Aug 30 08:55:02 MDT 2017
MAKER will start up where it left off as long as the settings are identical between runs.
—Carson
> On Aug 30, 2017, at 3:41 AM, Lahcen Campbell <lcampbell at ebi.ac.uk> wrote:
>
> Hello folks,
> Can anyone inform me on the ability of MAKER to restart from a checkpoint following annotation processing has compelted.
> I had an MPI MAKER job running successfully for 6 weeks for a de novo fly genome I am working on. It was running with mpich3-3.1-icc on LSF batch system using 96 cpu's and 140Gb RAM. MAKER had processed 91% of the overall assembly length of my genome under MAKER_Finished contigs. Numbers of "Finished" contigs hadn't changed for ~10 days when it died, as I assume MAKER was collecting annotated gene stats, collecting contig statistics and clustering of transcripts into fasta files etc: (As follows)
> ............
>
> clustering transcripts into genes for annotations
> Processing transcripts into genes
> adding statistics to annotations
> Calculating annotation quality statistics
> choosing best annotation set
> Choosing best annotations
> processing chunk output
> processing contig output
>
> However, this job exited after processing 26,767 of the 42,207 "MAKER Finished" contigs. The job died with a 255 exit code, which I suspect means somoene in our systems team may have killed the job to maintain system stability or someting.
> The following error output was captured:
>
> Calculating annotation quality statistics
> choosing best annotation set
> Choosing best annotations
> processing chunk output
> processing contig output
>
> [proxy:0:0 at loom7.ebi.ac.uk <mailto:proxy:0:0 at loom7.ebi.ac.uk>] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
> [proxy:0:0 at loom7.ebi.ac.uk <mailto:proxy:0:0 at loom7.ebi.ac.uk>] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at loom7.ebi.ac.uk <mailto:proxy:0:0 at loom7.ebi.ac.uk>] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:1 at loom15] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
> [proxy:0:5 at loom14] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
> [proxy:0:5 at loom14] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:4 at loom6.ebi.ac.uk <mailto:proxy:0:4 at loom6.ebi.ac.uk>] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
> [proxy:0:4 at loom6.ebi.ac.uk <mailto:proxy:0:4 at loom6.ebi.ac.uk>] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:4 at loom6.ebi.ac.uk <mailto:proxy:0:4 at loom6.ebi.ac.uk>] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:5 at loom14] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [proxy:0:1 at loom15] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at loom15] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
> [mpiexec at loom7.ebi.ac.uk <mailto:mpiexec at loom7.ebi.ac.uk>] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
> [mpiexec at loom7.ebi.ac.uk <mailto:mpiexec at loom7.ebi.ac.uk>] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at loom7.ebi.ac.uk <mailto:mpiexec at loom7.ebi.ac.uk>] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
> [mpiexec at loom7.ebi.ac.uk <mailto:mpiexec at loom7.ebi.ac.uk>] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
>
> Unfortunately since this process died, I have been unable to get the job reschduled again on our system due to resource limitations and job queing. But can anyone tell me, will MAKER be able to finish processing contig stats and information to completion, following this early exit ? I really can't afford another 6 weeks of computation so Im worried as you might expect. Would you recommend I submit this MAKER job again to finalize contig information/produce fasta files etc with the same amount of resources, or might I be able to request less resources without too much of a penalty in terms of compute time.
>
> Any hints or insight on this would be greatly appreciated.
>
> Thank you in advance,
>
> Lahcen
>
> EBI-Hinxton, UK.
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170830/0f1d191c/attachment-0003.html>
More information about the maker-devel
mailing list