<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>MPI is notorious for unexplicable communication errors, so first I would suggest just restarting and seeing if it happens again (MAKER will pick up where it left off on restart, so no need to alter settings or files).</div><div><br></div><div>If it happens again, we can look into it, but no component of the MPI communication framework changed between 2.25 and 2.26 (100% identical), so my first instinct is that this was just what the message said, a"Communication error with rank 18". If it happens again I can try and add some extra messages so we can see the hostname of rank 18. That way we can identify if it's constantly a specific node on your cluster.</div><div><br></div><div>Let me know if you see it again.</div><div><br></div><div>Thanks,</div><div>Carson</div><div><br></div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Yunfei Guo <<a href="mailto:guoyunfei1989@gmail.com">guoyunfei1989@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Wednesday, 25 July, 2012 3:15 PM<br><span style="font-weight:bold">To: </span> <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> [maker-devel] ERROR: MPI_Recv(186), dequeue_and_set_error(596)<br></div><div><br></div><div><div>Hi everyone,</div><div><br></div><div>I ran maker2.25 without a problem, but with maker2.26, I encountered the following error after running it for ~8 hr with 2 nodes and 24 cpus, do you have any idea what's going on here? Some contigs did get finished, maybe this is not a big problem. My mpich2 version 1.4.1p1, job scheduling system is SGE. Thanks!</div><div><br></div><div>running blast search.</div><div>#--------- command -------------#</div><div>Widget::blastx:</div><div>/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/blastx -db /tmp/6480.1.all.q/maker_PQOTIq/concatPro%2Etxt.mpi.10.4 -query /tmp/6480.1.all.q/maker_PQOTIq/rank3/scaffold2602.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/7A/37/scaffold2602//theVoid.scaffold2602/scaffold2602.0.concatPro%2Etxt.blastx.temp_dir/concatPro%2Etxt.mpi.10.4.blastx</div><div>#-------------------------------#</div><div>deleted:-1 hits</div><div>SIGCHLD handler "DEFAULT" not defined.</div><div>SIGCHLD handler "DEFAULT" not defined.</div><div>running exonerate search.</div><div>#--------- command -------------#</div><div>Widget::exonerate::protein2genome:</div><div>/home/username/usr/bin/exonerate -q /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sp%7CQ8N8A2%7CANR44_HUMAN.for.1-3712.8.fasta -t /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/scaffold2590.1-3712.8.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/scaffold2590.1-3712.sp%7CQ8N8A2%7CANR44_HUMAN.p_exonerate.8</div><div>#-------------------------------#</div><div>Fatal error in MPI_Recv: Other MPI error, error stack:</div><div>MPI_Recv(186).............: MPI_Recv(buf=0x7fffa3a2e760, count=2, MPI_INT, src=MPI_ANY_SOURCE, tag=1111, MPI_COMM_WORLD, status=0x7fffa3a2e740) failed</div><div>dequeue_and_set_error(596): Communication error with rank 18</div><div>running blast search.</div><div>#--------- command -------------#</div><div>Widget::blastx:</div><div>/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/blastx -db /tmp/6480.1.all.q/maker_PQOTIq/concatPro%2Etxt.mpi.10.8 -query /tmp/6480.1.all.q/maker_PQOTIq/rank11/scaffold2575.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F0/AE/scaffold2575//theVoid.scaffold2575/scaffold2575.0.concatPro%2Etxt.blastx.temp_dir/concatPro%2Etxt.mpi.10.8.blastx</div><div>#-------------------------------#</div><div>running blast search.</div><div>#--------- command -------------#</div><div>Widget::tblastx:</div><div>/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/tblastx -db /tmp/6480.1.all.q/maker_PQOTIq/AllSebESTs_plus_Rubri%2Efasta.mpi.10.1 -query /tmp/6480.1.all.q/maker_PQOTIq/rank7/scaffold2620.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/6B/FB/scaffold2620//theVoid.scaffold2620/scaffold2620.0.AllSebESTs_plus_Rubri%2Efasta.tblastx.temp_dir/AllSebESTs_plus_Rubri%2Efasta.mpi.10.1.tblastx</div><div>#-------------------------------#</div><div>running exonerate search.</div><div>#--------- command -------------#</div><div>Widget::exonerate::protein2genome:</div><div>/home/username/usr/bin/exonerate -q /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/sp%7CQ8NB46%7CANR52_HUMAN.for.1-3712.8.fasta -t /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/scaffold2590.1-3712.8.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/scaffold2590.1-3712.sp%7CQ8NB46%7CANR52_HUMAN.p_exonerate.8</div><div>#-------------------------------#</div><div>cleaning blastx...</div><div>in cluster::shadow_cluster...</div><div>...finished clustering.</div><div>cleaning clusters....</div><div>total clusters:1 now processing 0</div><div> ...processing 0 of 2</div><div>deleted:0 hits</div><div> ...processing 1 of 2</div><div>running blast search.</div><div>#--------- command -------------#</div><div>Widget::tblastx:</div><div>/home/yunfeiguo/Downloads/maker/bin/../exe/blast/bin/tblastx -db /tmp/6480.1.all.q/maker_PQOTIq/AllSebESTs_plus_Rubri%2Efasta.mpi.10.6 -query /tmp/6480.1.all.q/maker_PQOTIq/rank9/scaffold2615.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-10 -dbsize 1000 -searchsp 500000000 -num_threads 1 -lcase_masking -seg yes -soft_masking true -show_gis -out /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/E2/6E/scaffold2615//theVoid.scaffold2615/scaffold2615.0.AllSebESTs_plus_Rubri%2Efasta.tblastx.temp_dir/AllSebESTs_plus_Rubri%2Efasta.mpi.10.6.tblastx</div><div>#-------------------------------#</div><div>deleted:0 hits</div><div>running exonerate search.</div><div>#--------- command -------------#</div><div>Widget::exonerate::protein2genome:</div><div>/home/username/usr/bin/exonerate -q /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/tr%7CE7F7S0%7CE7F7S0_DANRE.for.1-3712.9.fasta -t /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/scaffold2590.1-3712.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /home/yunfeiguo/projects/fish/Nigro/run/dir_Nigro-53k00/Nigro-53k_part.maker.output/Nigro-53k_part_datastore/F9/9B/scaffold2590//theVoid.scaffold2590/<a href="http://scaffold2590.1-3712.tr">scaffold2590.1-3712.tr</a>%7CE7F7S0%7CE7F7S0_DANRE.p_exonerate.9</div><div>#-------------------------------#</div><div>deleted:0 hits</div><div>cleaning blastx...</div><div>cleaning clusters....</div><div>total clusters:1 now processing 0</div><div>cleaning clusters....</div><div>total clusters:1 now processing 0</div><div>deleted:-1 hits</div><div>deleted:-1 hits</div><div>deleted:-6 hits</div><div>deleted:-3 hits</div><div>deleted:-2 hits</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div><div>Perl exited with active threads:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1 running and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 finished and unjoined</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>0 running and detached</div></div><div><br></div><div>Yunfei</div><div><br></div>
_______________________________________________
maker-devel mailing list
<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a>
<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>
</span></body></html>