[maker-devel] Maker crash on increasingly small contigs
Marc Höppner
marc.hoeppner at imbim.uu.se
Wed Jan 28 00:01:48 MST 2015
Hi,
this is probably a long shot, but I was hoping that someone on the list may have some advice as to how to debug an error that has been popping up when running Maker on our 10 node cluster. So, what is the issue?
Maker runs fine on several assemblies that w have processed in the past, but I recently started on a fairly fragment (low N50) mammalian assembly and the collaborator was keen to have all contigs annotated, down to 1kb (I guess it is more about the repeats and blast matches in those small bits). Anyway, As the contigs get smaller, Maker starts crashing in MPI mode with the following error (no other message given prior to that):
perl:13424 terminated with signal 11 at PC=3d47095012 SP=7f8ac076e530. Backtrace:
/usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x22)[0x3d47095012]
/lib64/libpthread.so.0[0x358ae0f710]
/usr/lib64/perl5/CORE/libperl.so(Perl_csighandler+0x0)[0x3d47094ff0]
/lib64/libpthread.so.0[0x358ae0f710]
/lib64/libc.so.6(__poll+0x53)[0x358aadf343]
/sw/openmpi/1.8.3/lib/libopen-pal.so.6(+0x6af4a)[0x7f8ac0a29f4a]
/sw/openmpi/1.8.3/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x221)[0x7f8ac0a21961]
/sw/openmpi/1.8.3/lib/libopen-rte.so.7(+0x52f8e)[0x7f8ac0ce5f8e]
/lib64/libpthread.so.0[0x358ae079d1]
/lib64/libc.so.6(clone+0x6d)[0x358aae8b6d]
SIGTERM received
A few words about the setup:
We have 10 nodes, 160 cores and the shared file system is exported via Infiniband from a ‘standard’ NFS server. As OS we run Scientific Linux 6.5. Tests so far don’t point to congestion issues or anything like that, the bandwidth usage is actually fairly low. I
So far I tried:
- running the MPI processes through both the ethernet network as well as over IPoIB, same problem.
- installing a more recent version of perl through perlbrew, with all the required modules, and re-compiled Maker
- ran some (albeit simple) network checks to for retransmissions, lost packages etc - nothing popped up
- running Maker in a subset of nodes to eliminate the possibility of a bad node
The error message is a bit cryptic to me and it would be very helpful to know if Maker has a problem with accessing a file, or whether OpenMPI has a communication problem etc - but I am not able to tell from the information I have been able to extract so far. Any ideas?
So
Cheers,
Marc
Marc P. Hoeppner, PhD
Team Leader
BILS Genome Annotation Platform
Department for Medical Biochemistry and Microbiology
Uppsala University, Sweden
marc.hoeppner at imbim.uu.se
More information about the maker-devel
mailing list