[maker-devel] master_datastore_index.log file shrinks.]

Tue Mar 19 07:33:13 MDT 2013

Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
dsth at cantab.net
dsth at cpan.org

Hi Michael,

You're using ebi cluster? i have to ask, is this all just a really
elaborate way of avoiding the use of MPI that works perfectly well on both
the ebi and sanger compute farms? if you carry on in the direction you seem
to be going you're likely to end up with a considerable level of
unnecessary overhead and should possibly consider adapting the ensembl
genebuild pipeline to your specific needs.

Dan

Hello Carson!
>
> On 03/14/2013 04:49 PM, Michael Nuhn wrote:
> >> Try dialling back on the number of simultaneous instances you start and
> >> instead use MPI or the -cpus option to get the parallelization boost.
> >> Alternatively you can also split up the input file and use the -base
> >> option so everything gets written to the same place (then you never have
> >> to worry about locks affecting individual contigs - as no single
> instance
> >> has access to all the contigs)
> >>
> >> Example:
> >> fasta_tool --chunks 5 maize_assembly.fasta
> >> maker -g maize_assembly_0.fasta -base maize_assembly
> >> maker -g maize_assembly_1.fasta -base maize_assembly
> >>
> >> maker -g maize_assembly_2.fasta -base maize_assembly
> >>
> >> maker -g maize_assembly_3.fasta -base maize_assembly
> >>
> >> maker -g maize_assembly_4.fasta -base maize_assembly
> >>
> >> maker -dsindex
> >>
> >> Everything then gets written to maize_assembly.maker.output for all
> >> results.  The last call to maker with the -dsindex flag then rebuilds
> the
> >> datastore_index.log file to match the original maize_assembly.fasta file
>
> I have tried this, split my genome into 50 files and run them as you
> suggested above.
>
> This worked well most of the time, but now I am getting locking issues
> again. The working directory gets flooded with STACK.STACK.STACK.STACK
> ... files.
>
> What I think is happening is that for some reason the maker instances
> decide that they want to rebuild the index. This takes a lot of time
> and this blocks even more instances wanting to lock the index files.
> In the end most of the maker instances end up waiting.
>
> I would like to try the following, but I don't know, if this might
> cause problems later on:
>
> I would like to run all of the split sequence files as separate maker
> projects as if they were independent genomes. In the end I'd merge all
> the individual gff files using the gff3_merge script.
>
> Do you see any reason why this wouldn't work?
>
> Cheers,
> Michael.
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> ----- End forwarded message -----
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130319/3bc20adb/attachment-0002.html>