[maker-devel] thread terminated, causing all processes to fail

Ramón Fallon ramonfallon at gmail.com
Sun Mar 10 08:45:38 MDT 2013


Hi Carson,

In terms of rev 995, on a simplified version of our data set, I tried a
sequential run successfully, and even a "mpiexec -n 4" which ran to
completion.

In any case, many thanks for the new version 996. I did have a problem with
the build, namely the new line:
'bin/TACC.PL' => ['bin/ibrun'],

I tried to find TACC.PL unsuccessfully, so I decided to dispense with this
new line and then it compiled fine.

I started one or two tests and will inform you later about them. From my
end I must admit I am using a rather large EST fasta file, but is not
useful for test .. I will try to cut it down Monday or Tues so that tests
can be more agile.

Many thanks / Ramón.


On Fri, Mar 8, 2013 at 9:28 PM, Carson Holt <carsonhh at gmail.com> wrote:

> Also delete mpi_blastdb before retrying with the new svn repository.
>
> Thanks,
> Carson
>
>
> From: Carson Holt <carsonhh at gmail.com>
> Date: Friday, 8 March, 2013 3:20 PM
> To: Ramón Fallon <ramonfallon at gmail.com>
>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] thread terminated, causing all processes to
> fail
>
> I think I've found the potential cause and committed the necessary changes
> to fix it.
>
> Thanks,
> Carson
>
>
> From: Ramón Fallon <ramonfallon at gmail.com>
> Date: Thursday, 7 March, 2013 12:47 PM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] thread terminated, causing all processes to
> fail
>
> This is a standalone machine and no NFS at all. "df" gives a healthy
> amount of disk space, so there should be no problem there.
>
> Yes that file does exist although it has the nominal 12288 bytes size,
> which appears to be the minimum for a DB_file tie.
>
> As I mentioned the dpp_contig.fa example set does work so part of my
> investigation is looking at how.
>
> I can do some trivial unit tests on the Bioperl stat-before-tied-hashes
> situation and see what comes up.
>
> So I'll attempt to clear that up and then revert.
>
> Many thanks! / Ramón.
>
>
> On Thu, Mar 7, 2013 at 5:44 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> That is extremely odd.  It fails to even generate the indexes. Could you
>> check the drive space of your working directory and your /tmp directory?
>>
>> It is odd because Bioperl uses the stat command to check on the file
>> right before making a tied hash.  So it was there for the stat but not the
>> tie, which is immediately following.
>>
>> If you check manually does it exist now? -->
>>  /home/ramonf/makertrials/mgallocut7/sca29310_8.maker.output/mpi_blastdb/sca29310_8%2Efa.mpi.1/sca29310_8%2Efa.mpi.1.0.index
>>
>> Are you running in an NFS mounted directory?
>>
>> --Carson
>>
>>
>> From: Ramón Fallon <ramonfallon at gmail.com>
>> Date: Thursday, 7 March, 2013 9:40 AM
>>
>> To: Carson Holt <carson.holt at oicr.on.ca>
>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] thread terminated, causing all processes to
>> fail
>>
>> Hi Carson,
>>
>> I send you a zip of the text file of my repeated maker session, this time
>> having deleted the mpi_blastdb dir and with the -a flag added to "mpiexec
>> -n 8 maker -debug". Command line.
>>
>> Cheers / Ramón.
>>
>>
>> On Wed, Mar 6, 2013 at 7:49 PM, Ramón Fallon <ramonfallon at gmail.com>wrote:
>>
>>> OK, will do.
>>>
>>> Will get back to you tomorrow on it.
>>>
>>> Many thanks!
>>>
>>>
>>> On Wed, Mar 6, 2013 at 7:22 PM, Carson Holt <Carson.Holt at oicr.on.ca>wrote:
>>>
>>>> Could you delete your ../*maker.output/mpi_blastdb directory, and then
>>>> when rerunning maker, run with the –a flag.
>>>>
>>>> Thanks,
>>>> Carson
>>>>
>>>>
>>>> From: Ramón Fallon <ramonfallon at gmail.com>
>>>> Date: Wednesday, 6 March, 2013 1:15 PM
>>>> To: Carson Holt <carson.holt at oicr.on.ca>
>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>
>>>> Subject: Re: thread terminated, causing all processes to fail
>>>>
>>>> OK great, here goes .. many thanks!
>>>>
>>>>
>>>>
>>>> On Wed, Mar 6, 2013 at 7:04 PM, Carson Holt <Carson.Holt at oicr.on.ca>wrote:
>>>>
>>>>> If you do reply all to this message, I should get the attachment.  It
>>>>> will be stripped from the one going to the list though.
>>>>>
>>>>> Thanks,
>>>>> Carson
>>>>>
>>>>>
>>>>>
>>>>> From: Ramón Fallon <ramonfallon at gmail.com>
>>>>> Date: Wednesday, 6 March, 2013 12:57 PM
>>>>> To: <maker-devel at yandell-lab.org>
>>>>> Subject: Re: thread terminated, causing all processes to fail
>>>>>
>>>>> Hi,
>>>>>
>>>>> Many thanks for your quick reply and hint.
>>>>>
>>>>> Yes, you're right .. further up there is indeed
>>>>>
>>>>> Calling FastaDB::new at /opt/src/maker_svn/bin/../lib/FastaSeq.pm line
>>>>> 148 thread 1.
>>>>> Thread 1 terminated abnormally: ERROR: Could not reestablish DB to
>>>>> thaw FastaSeq for Storable
>>>>> --> rank=5, hostname=fatnode, at /opt/src/maker_svn/bin/maker line
>>>>> 1457 thread 1.
>>>>>
>>>>> I run a "script" session and have maker on -debug so I have everything
>>>>> in one file. Do you prefer to have it attached to a post to this mailing
>>>>> list (if it accepts txt attachments)
>>>>>
>>>>> Cheers.
>>>>>
>>>>>
>>>>> On Wed, Mar 6, 2013 at 6:34 PM, Ramón Fallon <ramonfallon at gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using the maker_svn rev 995 version and hand-compiled MPICH2 on a
>>>>>> single multicore machine.
>>>>>>
>>>>>> I've successfully run the dpp_contig.fasta (MPI/8 processes) example
>>>>>> but am having trouble with larger contigs fasta files of my own, which are
>>>>>> well formed.
>>>>>>
>>>>>> I've run into a problem whereby an mpiexec run of 8 processes will
>>>>>> stop due to a perl-thread related problem which says
>>>>>>
>>>>>> FATAL: Thread terminated, causing all processes to fail
>>>>>>
>>>>>> this corresponds to line 924 in the maker executable (which is for
>>>>>> the secondary/worker threads), and is the result of a test on !$thr OR'd
>>>>>> with !$thr->is_running, so clearly one of these is failing.
>>>>>>
>>>>>> $thr itself is a threads->new(\&$node_thread, $gdbfile). Despite
>>>>>> being a programmer, I've only recently started to look at the code and have
>>>>>> not got the hang of the parallelisation setup here, though I gather the
>>>>>> master must use threads to initially generate the parallel instances which
>>>>>> then use the message passing. Of course threads don't have message passing
>>>>>> ability, so I guess something clever is going on and will take some time
>>>>>> for me to understand.
>>>>>>
>>>>>> Clearly however, it has worked before on dpp_contigs, so it may be is
>>>>>> something wrong with my datafile or the way I am carrying out the analysis.
>>>>>>
>>>>>> Any clues that can be put my way are welcome.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130310/ae697dd3/attachment-0002.html>


More information about the maker-devel mailing list