[maker-devel] Further split genome questions

Jeanne Wilbrandt j.wilbrandt at zfmk.de
Wed Aug 13 03:32:38 MDT 2014


Our admin counts processes. Do I understand you right, that one CPU handles several
processes?

I'm still confused by the different directories (and I made a mistake when asking last
time, I wanted to say 'If I do NOT start the jobs in the same directory...). 
So, if I start each piece of a genome in its own directory (for example), then it gets a
unique basename (because the output will be separate from all other pieces anyway) and I
will not run dsindex but instead use gff3_merge for each piece's output and then once
again to merge all resulting gff3-files?

Hope I got you right :)

Thanks fopr your help!
Jeanne



On Wed, 6 Aug 2014 15:45:56 +0000
 Carson Holt <carsonhh at gmail.com> wrote:
>Is your admin counting processes or cpu usage?  Because each system call creates a
>separate process, so you can expect multiple processes (each system call generates a new
>process) but only a single cpu of usage per instance.  Use different directories if you
>are running that many jobs.  You can concatenate the separate results when your done.
> Use gff3_merge script to help concatenate the separate GFF3 files generated from
>separate jobs.
>
>--Carson
>
>Sent from my iPhone
>
>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>> 
>> 
>> 
>> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin
>reports
>> however, that the processes seem to assemble more threads than they are allowed. It is
>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why?
>> 
>> If I start the jobs in the same directory, how can I make sure they write to the same
>> directory (as, I think is required to put the pieces together in the end?)? das
>-basename
>> take paths?
>> 
>> 
>> On Wed, 6 Aug 2014 15:12:50 +0000
>> Carson Holt <carsonhh at gmail.com> wrote:
>>> I think the freezing is because you are starting too many simultaneous jobs.  You
>should
>>> try and use MPI to parallelize instead.  The concurrent job way of doing things can
>>> start to cause problems If you are running 10 or more jobs in the same directory. You
>>> could try splitting them into different directories.
>>> 
>>> --Carson
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>>>> 
>>>> 
>>>> aha, so this explains that. 
>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000,
>roughly
>>>> half of the sequences being shorter than 3,000 bp.
>>>> 
>>>> What do you think about this weird 'I am running but not really doing
>>> anything'-behavior?
>>>> 
>>>> 
>>>> Thanks a lot!
>>>> Jeanne
>>>> 
>>>> 
>>>> 
>>>> On Wed, 6 Aug 2014 14:16:52 +0000
>>>> Carson Holt <carsonhh at gmail.com> wrote:
>>>>> If you are starting and restarting, or running multiple jobs then the log can be
>>>>> partially rebuilt.  On rebuild only the FINISHED entries are added.  If there is a
>>> GFF3
>>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for
>the
>>>>> contigs that have gene models. Small contigs will rarely contain models.
>>>>> 
>>>>> --Carson
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>>>>>> 
>>>>>> 
>>>>>> Hi Carson, 
>>>>>> 
>>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split
>>>>> into
>>>>>> 20 parts, using the -g flag and the same basename.
>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish
>normally,
>>>>> while
>>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have
>any
>>>>>> suggestion why this is happening?
>>>>>> 
>>>>>> After I stopped these stalled jobs, I checked the index.log and found that of
>38.384
>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of
>>>>> these
>>>>>> only appear as FINISHED (the rest only started). There are no models for these
>>>>> 'finished'
>>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome
>>>>> (i.e.,
>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished')
>>>>>> Should this be an issue of concern?
>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look
>>> good,
>>>>> so
>>>>>> we suspect something fishy going on...
>>>>>> 
>>>>>> Hope you can help,
>>>>>> best wishes,
>>>>>> Jeanne Wilbrandt
>>>>>> 
>>>>>> zmb // ZFMK // University of Bonn
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> maker-devel mailing list
>>>>>> maker-devel at box290.bluehost.com
>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 





More information about the maker-devel mailing list