[maker-devel] Further split genome questions

Wed Aug 13 09:52:34 MDT 2014

Yes. One cpu will have several processes, most are helper processes that
will use 0% CPU almost all of the time (for example there is a shared
variable manager process that will launch with MAKER but will also be
called 'maker' under top because it is technically its child and not a
separate script).  Also system calls will launch a new process that will
use all CPU while the process calling it will drop to 0% CPU until it
finishes.

Yes.  Your explanation is correct. You then use gff3_merge to merge the
GFF3 file.

--Carson

On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:

>
>Our admin counts processes. Do I understand you right, that one CPU
>handles several
>processes?
>
>I'm still confused by the different directories (and I made a mistake
>when asking last
>time, I wanted to say 'If I do NOT start the jobs in the same
>directory...). 
>So, if I start each piece of a genome in its own directory (for example),
>then it gets a
>unique basename (because the output will be separate from all other
>pieces anyway) and I
>will not run dsindex but instead use gff3_merge for each piece's output
>and then once
>again to merge all resulting gff3-files?
>
>Hope I got you right :)
>
>Thanks fopr your help!
>Jeanne
>
>
>
>On Wed, 6 Aug 2014 15:45:56 +0000
> Carson Holt <carsonhh at gmail.com> wrote:
>>Is your admin counting processes or cpu usage?  Because each system call
>>creates a
>>separate process, so you can expect multiple processes (each system call
>>generates a new
>>process) but only a single cpu of usage per instance.  Use different
>>directories if you
>>are running that many jobs.  You can concatenate the separate results
>>when your done.
>> Use gff3_merge script to help concatenate the separate GFF3 files
>>generated from
>>separate jobs.
>>
>>--Carson
>>
>>Sent from my iPhone
>>
>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de>
>>>wrote:
>>> 
>>> 
>>> 
>>> We are using MPI as well, each of the 20 parts gets assigned 4
>>>threads. Our admin
>>reports
>>> however, that the processes seem to assemble more threads than they
>>>are allowed. It is
>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a
>>>suggestion why?
>>> 
>>> If I start the jobs in the same directory, how can I make sure they
>>>write to the same
>>> directory (as, I think is required to put the pieces together in the
>>>end?)? das
>>-basename
>>> take paths?
>>> 
>>> 
>>> On Wed, 6 Aug 2014 15:12:50 +0000
>>> Carson Holt <carsonhh at gmail.com> wrote:
>>>> I think the freezing is because you are starting too many
>>>>simultaneous jobs.  You
>>should
>>>> try and use MPI to parallelize instead.  The concurrent job way of
>>>>doing things can
>>>> start to cause problems If you are running 10 or more jobs in the
>>>>same directory. You
>>>> could try splitting them into different directories.
>>>> 
>>>> --Carson
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de>
>>>>>wrote:
>>>>> 
>>>>> 
>>>>> aha, so this explains that.
>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more
>>>>>than 60,000,
>>roughly
>>>>> half of the sequences being shorter than 3,000 bp.
>>>>> 
>>>>> What do you think about this weird 'I am running but not really doing
>>>> anything'-behavior?
>>>>> 
>>>>> 
>>>>> Thanks a lot!
>>>>> Jeanne
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, 6 Aug 2014 14:16:52 +0000
>>>>> Carson Holt <carsonhh at gmail.com> wrote:
>>>>>> If you are starting and restarting, or running multiple jobs then
>>>>>>the log can be
>>>>>> partially rebuilt.  On rebuild only the FINISHED entries are added.
>>>>>> If there is a
>>>> GFF3
>>>>>> result file for the contig, then it is FINISHED. FASTA files will
>>>>>>only exist for
>>the
>>>>>> contigs that have gene models. Small contigs will rarely contain
>>>>>>models.
>>>>>> 
>>>>>> --Carson
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt"
>>>>>>><j.wilbrandt at zfmk.de> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Carson, 
>>>>>>> 
>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a
>>>>>>>genome which is split
>>>>>> into
>>>>>>> 20 parts, using the -g flag and the same basename.
>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to
>>>>>>>finish
>>normally,
>>>>>> while
>>>>>>> the remaining three seemed to be stalled and produced 0B of
>>>>>>>output. Do you have
>>any
>>>>>>> suggestion why this is happening?
>>>>>>> 
>>>>>>> After I stopped these stalled jobs, I checked the index.log and
>>>>>>>found that of
>>38.384
>>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise
>>>>>>>is, that 2/3 of
>>>>>> these
>>>>>>> only appear as FINISHED (the rest only started). There are no
>>>>>>>models for these
>>>>>> 'finished'
>>>>>>> scaffolds stored in the .db and they are distributed over all
>>>>>>>parts of the genome
>>>>>> (i.e.,
>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but
>>>>>>>'finished')
>>>>>>> Should this be an issue of concern?
>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the
>>>>>>>NFS files look
>>>> good,
>>>>>> so
>>>>>>> we suspect something fishy going on...
>>>>>>> 
>>>>>>> Hope you can help,
>>>>>>> best wishes,
>>>>>>> Jeanne Wilbrandt
>>>>>>> 
>>>>>>> zmb // ZFMK // University of Bonn
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> 
>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.
>>>>>>>org
>>> 
>