[maker-devel] Maker on Amazon EC2 Using Starcluster

Jason Gallant jgallant at msu.edu
Wed Jan 21 06:56:02 MST 2015


Hi Everyone,


I’m attempting to run Maker on Amazon EC2 using MIT’s starcluster— I’ve started a 200 node cluster, and enabled MPICH2 (Starcluster by default uses OpenMPI).  I plan on documenting this setup once I’ve figured out how to run things reliably.


I’m having a persistent issue where something fails on one of the nodes, and std error is flooded with:


examining contents of the fasta file and run log
[67] ERROR: could not make datastore directory
[67] --> rank=67, hostname=node067
[67] ERROR: Failed while examining contents of the fasta file and run log
[67] ERROR: Chunk failed at level:0, tier_type:0
[67] FAILED CONTIG:Scaffold261


This error repeats for each “next” scaffold for some time.  When I go back to find the “source” of the error in the log, the following is the first error message on that node:


67] #-------------------------------#
[67] deleted:-60 hits
[67] collecting blastx reports
[67] ERROR: Could not colapse BLAST reports
[67]  at /root/maker/bin/../lib/GI.pm line 2524 thread 1.
[67] 	GI::combine_blast_report(FastaChunk=HASH(0x108e1a90), ARRAY(0x1b874938), ARRAY(0xf127ad8), runlog=HASH(0x4d54ed8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760 thread 1
[67] 	Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415 thread 1
[67] 	eval {...} called at /root/maker/bin/../lib/Error.pm line 407 thread 1
[67] 	Error::subs::try(CODE(0x1514eb00), HASH(0x9cbeb568)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215 thread 1
[67] 	Process::MpiChunk::_go(Process::MpiChunk=HASH(0x13976308), "run", HASH(0x12e04268), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341 thread 1
[67] 	Process::MpiChunk::run(Process::MpiChunk=HASH(0x13976308), 67) called at /root/maker/bin/maker line 1457 thread 1
[67] 	main::node_thread("/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1
[67] 	eval {...} called at /usr/local/lib/perl/5.14.2/forks.pm line 799 thread 1
[67] 	threads::new("threads", CODE(0x3dc5b38), "/mnt/data/paramormyrops_new_annotation/supercontigs.maker.out"...) called at /root/maker/bin/maker line 917 thread 1
[67] --> rank=67, hostname=node067
[67] ERROR: Failed while collecting blastx reports
[67] ERROR: Chunk failed at level:9, tier_type:3
[67] FAILED CONTIG:Scaffold66
[67] 
[67] ERROR: Chunk failed at level:4, tier_type:0
[67] FAILED CONTIG:Scaffold66




I’ve attempted to ignore the error to see if things will proceed on the other 199 processors.  When I returned to the “master” node after the evening, Maker keeps repeating the same error code over and over (same scaffold):
] examining contents of the fasta file and run log
[67] ERROR: could not make datastore directory
[67] --> rank=67, hostname=node067
[67] ERROR: Failed while examining contents of the fasta file and run log
[67] ERROR: Chunk failed at level:0, tier_type:0
[67] FAILED CONTIG:Scaffold1589


I stop the job, and restart, and after only a few minutes of running, the same error is reported, this time on a new scaffold.  Strangely here, the error is reported in the MPI tag of node001, but the error originates at node137:


ERROR: Could not colapse BLAST reports
[1]  at /root/maker/bin/../lib/GI.pm line 2524.
[1]     GI::combine_blast_report(FastaChunk=HASH(0xf4aa9b8), ARRAY(0xf628f90), ARRAY(0x325fea78), runlog=HASH(0x133cc8e8)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 2760
[1]     Process::MpiChunk::__ANON__() called at /root/maker/bin/../lib/Error.pm line 415
[1]     eval {...} called at /root/maker/bin/../lib/Error.pm line 407
[1]     Error::subs::try(CODE(0x352c9b8), HASH(0xdab3b690)) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 4215
[1]     Process::MpiChunk::_go(Process::MpiChunk=HASH(0x3545d90), "run", HASH(0x30aa710), 9, 3) called at /root/maker/bin/../lib/Process/MpiChunk.pm line 341
[1]     Process::MpiChunk::run(Process::MpiChunk=HASH(0x3545d90), 137) called at /root/maker/bin/maker line 979
[1] --> rank=137, hostname=node137
[1] ERROR: Failed while collecting blastx reports
[1] ERROR: Chunk failed at level:9, tier_type:3
[1] FAILED CONTIG:Scaffold249
[1]
[1] ERROR: Chunk failed at level:4, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249
[1]
[1] examining contents of the fasta file and run log
[1] ERROR: could not make datastore directory
[1] --> rank=1, hostname=node001
[1] ERROR: Failed while examining contents of the fasta file and run log
[1] ERROR: Chunk failed at level:0, tier_type:0
[1] FAILED CONTIG:Scaffold249


I’d appreciate any guidance as how best to diagnose this error!


Many thanks,
Jason Gallant











—
Dr. Jason R. GallantAssistant Professor
Room 38 Natural Sciences
Department of Zoology
Michigan State University
East Lansing, MI 48824
jgallant at msu.edu

office: 517-884-7756
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150121/e92ea82d/attachment-0002.html>


More information about the maker-devel mailing list