[maker-devel] master_datastore_index.log file shrinks.]

Tue Mar 19 09:26:02 MDT 2013

oh and (1) it will work as long as evidence etc., is synchronous, (2) it
will be really inefficient - be glad ebi doesn't use a by group compute
time fair-share policy ;)

Dan

from me phone...
On Mar 19, 2013 12:13 PM, "Daniel Hughes" <dsth at ebi.ac.uk> wrote:

> You really don't need to know anything about MPI. While MPI is itself
> pretty complex, I seem to recall maker uses the p2p subset alone mainly to
> send serialised perl objects as c strings etc., for IPC across ad hoc
> infrastructure - but none of that is relevant as Carson has done all the
> IPC debugging for you and its use should be transparent. If it's failing,
> its almost certainly because you've got discrepencies between the mpi
> libraries visible at compile-time vs. run-time and you may need to force
> the dynamic linker to behave itself. The only other caveat on ebi
> infrastructure i can think of off the top of my head relates to cross-node
> MPI usage when going into the hundreds of processes but i'm assuming you
> not doing that? You need to be more specific about how it's failing.
>
> dan
>
> from me phone...
> On Mar 19, 2013 11:55 AM, "Michael Nuhn" <mnuhn at ebi.ac.uk> wrote:
>
>> Hello Carson!
>>
>> On 03/19/2013 02:27 PM, Carson Holt wrote:
>>
>>> Yes.  If at all possible use MPI.  It removes the overhead of locks
>>> which happen per primary instance of MAKER.  So one maker job using 1000
>>> cpus via MPI will have one shared set of locks.  1000 serial instances
>>> of MAKER on the other hand would have 1000x the locks.
>>>
>>
>> I don't know a thing about MPI.
>>
>> I tried installing maker (2.2.7) with mpich-3.0.2, mpich2-1.4.1 and open
>> mpi and none of them worked for me. I also tried the automatic installation
>> that comes with maker, but it didn't work for me either.
>>
>> If need be, I could spend time getting to the bottom of this, but there
>> is no telling how long this would take me so I'd rather not, if there is an
>> alternative.
>>
>> Would the approach I outlined before work? (Treating the split files as
>> separate genomes to annotate and then combine the gffs afterwards)
>>
>> I also like this approach, because I would select a few contigs in the
>> beginning which I would run on their own. They would complete early and
>> this way I would get a preview of the results of the run instead of having
>> to wait for everything to complete.
>>
>> It might also be more robust, because file locking issues would be
>> confined to the instances working on a sequence chunk, but the rest of the
>> instances could continue working.
>>
>> Cheers,
>> Michael.
>>
>>  Alternatively if you do need to continue without MPI for some reason, I
>>> just finished a devel version of MAKER that has a --no_locks option.
>>>   You can never start two instances using the same input fasta when
>>> --no_locks is specified, but the splitting to use different input fastas
>>> I mentioned before in the example will still work fine.
>>>
>>> I also have updated the indexing/reindexing, so if indexing failures
>>> happen, MAKER will switch between the current working directory and the
>>> TMP= directory from the maker_opts.ctl file so as to try different IO
>>> locations (I.e. NFS and non-NFS).  Note you should never set TMP= in the
>>> control files to an NFS mounted location (it not only makes things a lot
>>> slower, but berkleydb and sqllite will get frequent errors on NFS).
>>>   TMP= defaults to /tmp when not specified
>>>
>>> I'll send you download information in a separate e-mail.  Try a regular
>>> MAKER run to see if the indexing/reindexing changes are sufficient
>>> before attempting the —no_locks option.
>>>
>>> Thanks,
>>> Carson
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130319/64306b18/attachment-0003.html>