[maker-devel] master_datastore_index.log file shrinks.]

Tue Mar 19 08:27:16 MDT 2013

Yes.  If at all possible use MPI.  It removes the overhead of locks which
happen per primary instance of MAKER.  So one maker job using 1000 cpus via
MPI will have one shared set of locks.  1000 serial instances of MAKER on
the other hand would have 1000x the locks.

Alternatively if you do need to continue without MPI for some reason, I just
finished a devel version of MAKER that has a --no_locks option.  You can
never start two instances using the same input fasta when --no_locks is
specified, but the splitting to use different input fastas I mentioned
before in the example will still work fine.

I also have updated the indexing/reindexing, so if indexing failures happen,
MAKER will switch between the current working directory and the TMP=
directory from the maker_opts.ctl file so as to try different IO locations
(I.e. NFS and non-NFS).  Note you should never set TMP= in the control files
to an NFS mounted location (it not only makes things a lot slower, but
berkleydb and sqllite will get frequent errors on NFS).  TMP= defaults to
/tmp when not specified

I'll send you download information in a separate e-mail.  Try a regular
MAKER run to see if the indexing/reindexing changes are sufficient before
attempting the ‹no_locks option.

Thanks,
Carson

From:  Daniel Hughes <dsth at ebi.ac.uk>
Date:  Tuesday, 19 March, 2013 9:33 AM
To:  Michael Nuhn <mnuhn at ebi.ac.uk>, <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] master_datastore_index.log file shrinks.]

Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
----------------------------------------------------------------------------
---------
dsth at cantab.net
dsth at cpan.org

Hi Michael,

You're using ebi cluster? i have to ask, is this all just a really elaborate
way of avoiding the use of MPI that works perfectly well on both the ebi and
sanger compute farms? if you carry on in the direction you seem to be going
you're likely to end up with a considerable level of unnecessary overhead
and should possibly consider adapting the ensembl genebuild pipeline to your
specific needs.

Dan

> Hello Carson!
> 
> On 03/14/2013 04:49 PM, Michael Nuhn wrote:
>>> >> Try dialling back on the number of simultaneous instances you start and
>>> >> instead use MPI or the -cpus option to get the parallelization boost.
>>> >> Alternatively you can also split up the input file and use the -base
>>> >> option so everything gets written to the same place (then you never have
>>> >> to worry about locks affecting individual contigs - as no single instance
>>> >> has access to all the contigs)
>>> >>
>>> >> Example:
>>> >> fasta_tool --chunks 5 maize_assembly.fasta
>>> >> maker -g maize_assembly_0.fasta -base maize_assembly
>>> >> maker -g maize_assembly_1.fasta -base maize_assembly
>>> >>
>>> >> maker -g maize_assembly_2.fasta -base maize_assembly
>>> >>
>>> >> maker -g maize_assembly_3.fasta -base maize_assembly
>>> >>
>>> >> maker -g maize_assembly_4.fasta -base maize_assembly
>>> >>
>>> >> maker -dsindex
>>> >>
>>> >> Everything then gets written to maize_assembly.maker.output for all
>>> >> results.  The last call to maker with the -dsindex flag then rebuilds the
>>> >> datastore_index.log file to match the original maize_assembly.fasta file
> 
> I have tried this, split my genome into 50 files and run them as you
> suggested above.
> 
> This worked well most of the time, but now I am getting locking issues
> again. The working directory gets flooded with STACK.STACK.STACK.STACK
> ... files.
> 
> What I think is happening is that for some reason the maker instances
> decide that they want to rebuild the index. This takes a lot of time
> and this blocks even more instances wanting to lock the index files.
> In the end most of the maker instances end up waiting.
> 
> I would like to try the following, but I don't know, if this might
> cause problems later on:
> 
> I would like to run all of the split sequence files as separate maker
> projects as if they were independent genomes. In the end I'd merge all
> the individual gff files using the gff3_merge script.
> 
> Do you see any reason why this wouldn't work?
> 
> Cheers,
> Michael.
> 
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> ----- End forwarded message -----
> 

_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130319/2123146e/attachment-0003.html>