<br clear="all"><div>Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)<br>-------------------------------------------------------------------------------------<br><a href="mailto:dsth@cantab.net">dsth@cantab.net</a><br>


<a href="mailto:dsth@cpan.org">dsth@cpan.org</a></div>

<br>Hi Michael,<br><br>You're using ebi cluster? i have to ask, is this all just a really elaborate way of avoiding the use of MPI that works perfectly well on both the ebi and sanger compute farms? if you carry on in the direction you seem to be going you're likely to end up with a considerable level of unnecessary overhead and should possibly consider adapting the ensembl genebuild pipeline to your specific needs.<br>


<br>Dan<br><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello Carson!<br>

<br>

On 03/14/2013 04:49 PM, Michael Nuhn wrote:<br>

>> Try dialling back on the number of simultaneous instances you start and<br>

>> instead use MPI or the -cpus option to get the parallelization boost.<br>

>> Alternatively you can also split up the input file and use the -base<br>

>> option so everything gets written to the same place (then you never have<br>

>> to worry about locks affecting individual contigs - as no single instance<br>

>> has access to all the contigs)<br>

>><br>

>> Example:<br>

>> fasta_tool --chunks 5 maize_assembly.fasta<br>

>> maker -g maize_assembly_0.fasta -base maize_assembly<br>

>> maker -g maize_assembly_1.fasta -base maize_assembly<br>

>><br>

>> maker -g maize_assembly_2.fasta -base maize_assembly<br>

>><br>

>> maker -g maize_assembly_3.fasta -base maize_assembly<br>

>><br>

>> maker -g maize_assembly_4.fasta -base maize_assembly<br>

>><br>

>> maker -dsindex<br>

>><br>

>> Everything then gets written to maize_assembly.maker.output for all<br>

>> results.  The last call to maker with the -dsindex flag then rebuilds the<br>

>> datastore_index.log file to match the original maize_assembly.fasta file<br>

<br>

I have tried this, split my genome into 50 files and run them as you<br>

suggested above.<br>

<br>

This worked well most of the time, but now I am getting locking issues<br>

again. The working directory gets flooded with STACK.STACK.STACK.STACK<br>

... files.<br>

<br>

What I think is happening is that for some reason the maker instances<br>

decide that they want to rebuild the index. This takes a lot of time<br>

and this blocks even more instances wanting to lock the index files.<br>

In the end most of the maker instances end up waiting.<br>

<br>

I would like to try the following, but I don't know, if this might<br>

cause problems later on:<br>

<br>

I would like to run all of the split sequence files as separate maker<br>

projects as if they were independent genomes. In the end I'd merge all<br>

the individual gff files using the gff3_merge script.<br>

<br>

Do you see any reason why this wouldn't work?<br>

<br>

Cheers,<br>

Michael.<br>

<br>

<br>

<br>

_______________________________________________<br>

maker-devel mailing list<br>

<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br>

<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br>

<br>

----- End forwarded message -----<br>

<br>

</blockquote><br></div><br>