[maker-devel] ERROR: MPI_Recv(186), dequeue_and_set_error(596)

Carson Holt carsonhh at gmail.com
Wed Jul 25 13:46:17 MDT 2012


MPI is notorious for unexplicable communication errors, so first I would
suggest just restarting and seeing if it happens again (MAKER will pick up
where it left off on restart, so no need to alter settings or files).

If it happens again, we can look into it, but no component of the MPI
communication framework changed between 2.25 and 2.26 (100% identical), so
my first instinct is that this was just what the message said,
a"Communication error with rank 18".  If it happens again I can try and add
some extra messages so we can see the hostname of rank 18.  That way we can
identify if it's constantly a specific node on your cluster.

Let me know if you see it again.

Thanks,
Carson



From:  Yunfei Guo <guoyunfei1989 at gmail.com>
Date:  Wednesday, 25 July, 2012 3:15 PM
To:  <maker-devel at yandell-lab.org>
Subject:  [maker-devel] ERROR: MPI_Recv(186), dequeue_and_set_error(596)

Hi everyone,

I ran maker2.25 without a problem, but with maker2.26, I encountered the
following error after running it for ~8 hr with 2 nodes and 24 cpus, do you
have any idea what's going on here? Some contigs did get finished, maybe
this is not a big problem. My mpich2 version 1.4.1p1, job scheduling system
is SGE. Thanks!

running  blast search.
#--------- command -------------#
Widget::blastx:
/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/blastx -db
/tmp/6480.1.all.q/maker_PQOTIq/concatPro%2Etxt.mpi.10.4 -query
/tmp/6480.1.all.q/maker_PQOTIq/rank3/scaffold2602.0 -num_alignments 10000
-num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000
-num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/7A/37/scaffold2602//theVoid.scaffold2602/sc
affold2602.0.concatPro%2Etxt.blastx.temp_dir/concatPro%2Etxt.mpi.10.4.blastx
#-------------------------------#
deleted:-1 hits
SIGCHLD handler "DEFAULT" not defined.
SIGCHLD handler "DEFAULT" not defined.
running  exonerate search.
#--------- command -------------#
Widget::exonerate::protein2genome:
/home/username/usr/bin/exonerate -q
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sp
%7CQ8N8A2%7CANR44_HUMAN.for.1-3712.8.fasta -t
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sc
affold2590.1-3712.8.fasta -Q protein -T dna -m protein2genome
--softmasktarget  --percent 20 --showcigar  >
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sc
affold2590.1-3712.sp%7CQ8N8A2%7CANR44_HUMAN.p_exonerate.8
#-------------------------------#
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(186).............: MPI_Recv(buf=0x7fffa3a2e760, count=2, MPI_INT,
src=MPI_ANY_SOURCE, tag=1111, MPI_COMM_WORLD, status=0x7fffa3a2e740) failed
dequeue_and_set_error(596): Communication error with rank 18
running  blast search.
#--------- command -------------#
Widget::blastx:
/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/blastx -db
/tmp/6480.1.all.q/maker_PQOTIq/concatPro%2Etxt.mpi.10.8 -query
/tmp/6480.1.all.q/maker_PQOTIq/rank11/scaffold2575.0 -num_alignments 10000
-num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000
-num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F0/AE/scaffold2575//theVoid.scaffold2575/sc
affold2575.0.concatPro%2Etxt.blastx.temp_dir/concatPro%2Etxt.mpi.10.8.blastx
#-------------------------------#
running  blast search.
#--------- command -------------#
Widget::tblastx:
/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/tblastx -db
/tmp/6480.1.all.q/maker_PQOTIq/AllSebESTs_plus_Rubri%2Efasta.mpi.10.1 -query
/tmp/6480.1.all.q/maker_PQOTIq/rank7/scaffold2620.0 -num_alignments 10000
-num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000
-num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/6B/FB/scaffold2620//theVoid.scaffold2620/sc
affold2620.0.AllSebESTs_plus_Rubri%2Efasta.tblastx.temp_dir/AllSebESTs_plus_
Rubri%2Efasta.mpi.10.1.tblastx
#-------------------------------#
running  exonerate search.
#--------- command -------------#
Widget::exonerate::protein2genome:
/home/username/usr/bin/exonerate -q
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sp
%7CQ8NB46%7CANR52_HUMAN.for.1-3712.8.fasta -t
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sc
affold2590.1-3712.8.fasta -Q protein -T dna -m protein2genome
--softmasktarget  --percent 20 --showcigar  >
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sc
affold2590.1-3712.sp%7CQ8NB46%7CANR52_HUMAN.p_exonerate.8
#-------------------------------#
cleaning blastx...
in cluster::shadow_cluster...
...finished clustering.
cleaning clusters....
total clusters:1 now processing 0
 ...processing 0 of 2
deleted:0 hits
 ...processing 1 of 2
running  blast search.
#--------- command -------------#
Widget::tblastx:
/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/tblastx -db
/tmp/6480.1.all.q/maker_PQOTIq/AllSebESTs_plus_Rubri%2Efasta.mpi.10.6 -query
/tmp/6480.1.all.q/maker_PQOTIq/rank9/scaffold2615.0 -num_alignments 10000
-num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000
-num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/E2/6E/scaffold2615//theVoid.scaffold2615/sc
affold2615.0.AllSebESTs_plus_Rubri%2Efasta.tblastx.temp_dir/AllSebESTs_plus_
Rubri%2Efasta.mpi.10.6.tblastx
#-------------------------------#
deleted:0 hits
running  exonerate search.
#--------- command -------------#
Widget::exonerate::protein2genome:
/home/username/usr/bin/exonerate -q
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/tr
%7CE7F7S0%7CE7F7S0_DANRE.for.1-3712.9.fasta -t
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sc
affold2590.1-3712.9.fasta -Q protein -T dna -m protein2genome
--softmasktarget  --percent 20 --showcigar  >
/home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker
.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sc
affold2590.1-3712.tr <http://scaffold2590.1-3712.tr>
%7CE7F7S0%7CE7F7S0_DANRE.p_exonerate.9
#-------------------------------#
deleted:0 hits
cleaning blastx...
cleaning clusters....
total clusters:1 now processing 0
cleaning clusters....
total clusters:1 now processing 0
deleted:-1 hits
deleted:-1 hits
deleted:-6 hits
deleted:-3 hits
deleted:-2 hits
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads:
1 running and unjoined
0 finished and unjoined
0 running and detached

Yunfei

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120725/e8f75a49/attachment-0003.html>


More information about the maker-devel mailing list