[maker-devel] Further split genome questions
Carson Holt
carsonhh at gmail.com
Wed Aug 13 09:52:34 MDT 2014
Yes. One cpu will have several processes, most are helper processes that
will use 0% CPU almost all of the time (for example there is a shared
variable manager process that will launch with MAKER but will also be
called 'maker' under top because it is technically its child and not a
separate script). Also system calls will launch a new process that will
use all CPU while the process calling it will drop to 0% CPU until it
finishes.
Yes. Your explanation is correct. You then use gff3_merge to merge the
GFF3 file.
--Carson
On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>
>Our admin counts processes. Do I understand you right, that one CPU
>handles several
>processes?
>
>I'm still confused by the different directories (and I made a mistake
>when asking last
>time, I wanted to say 'If I do NOT start the jobs in the same
>directory...).
>So, if I start each piece of a genome in its own directory (for example),
>then it gets a
>unique basename (because the output will be separate from all other
>pieces anyway) and I
>will not run dsindex but instead use gff3_merge for each piece's output
>and then once
>again to merge all resulting gff3-files?
>
>Hope I got you right :)
>
>Thanks fopr your help!
>Jeanne
>
>
>
>On Wed, 6 Aug 2014 15:45:56 +0000
> Carson Holt <carsonhh at gmail.com> wrote:
>>Is your admin counting processes or cpu usage? Because each system call
>>creates a
>>separate process, so you can expect multiple processes (each system call
>>generates a new
>>process) but only a single cpu of usage per instance. Use different
>>directories if you
>>are running that many jobs. You can concatenate the separate results
>>when your done.
>> Use gff3_merge script to help concatenate the separate GFF3 files
>>generated from
>>separate jobs.
>>
>>--Carson
>>
>>Sent from my iPhone
>>
>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de>
>>>wrote:
>>>
>>>
>>>
>>> We are using MPI as well, each of the 20 parts gets assigned 4
>>>threads. Our admin
>>reports
>>> however, that the processes seem to assemble more threads than they
>>>are allowed. It is
>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a
>>>suggestion why?
>>>
>>> If I start the jobs in the same directory, how can I make sure they
>>>write to the same
>>> directory (as, I think is required to put the pieces together in the
>>>end?)? das
>>-basename
>>> take paths?
>>>
>>>
>>> On Wed, 6 Aug 2014 15:12:50 +0000
>>> Carson Holt <carsonhh at gmail.com> wrote:
>>>> I think the freezing is because you are starting too many
>>>>simultaneous jobs. You
>>should
>>>> try and use MPI to parallelize instead. The concurrent job way of
>>>>doing things can
>>>> start to cause problems If you are running 10 or more jobs in the
>>>>same directory. You
>>>> could try splitting them into different directories.
>>>>
>>>> --Carson
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de>
>>>>>wrote:
>>>>>
>>>>>
>>>>> aha, so this explains that.
>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more
>>>>>than 60,000,
>>roughly
>>>>> half of the sequences being shorter than 3,000 bp.
>>>>>
>>>>> What do you think about this weird 'I am running but not really doing
>>>> anything'-behavior?
>>>>>
>>>>>
>>>>> Thanks a lot!
>>>>> Jeanne
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 6 Aug 2014 14:16:52 +0000
>>>>> Carson Holt <carsonhh at gmail.com> wrote:
>>>>>> If you are starting and restarting, or running multiple jobs then
>>>>>>the log can be
>>>>>> partially rebuilt. On rebuild only the FINISHED entries are added.
>>>>>> If there is a
>>>> GFF3
>>>>>> result file for the contig, then it is FINISHED. FASTA files will
>>>>>>only exist for
>>the
>>>>>> contigs that have gene models. Small contigs will rarely contain
>>>>>>models.
>>>>>>
>>>>>> --Carson
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt"
>>>>>>><j.wilbrandt at zfmk.de> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Carson,
>>>>>>>
>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a
>>>>>>>genome which is split
>>>>>> into
>>>>>>> 20 parts, using the -g flag and the same basename.
>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to
>>>>>>>finish
>>normally,
>>>>>> while
>>>>>>> the remaining three seemed to be stalled and produced 0B of
>>>>>>>output. Do you have
>>any
>>>>>>> suggestion why this is happening?
>>>>>>>
>>>>>>> After I stopped these stalled jobs, I checked the index.log and
>>>>>>>found that of
>>38.384
>>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise
>>>>>>>is, that 2/3 of
>>>>>> these
>>>>>>> only appear as FINISHED (the rest only started). There are no
>>>>>>>models for these
>>>>>> 'finished'
>>>>>>> scaffolds stored in the .db and they are distributed over all
>>>>>>>parts of the genome
>>>>>> (i.e.,
>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but
>>>>>>>'finished')
>>>>>>> Should this be an issue of concern?
>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the
>>>>>>>NFS files look
>>>> good,
>>>>>> so
>>>>>>> we suspect something fishy going on...
>>>>>>>
>>>>>>> Hope you can help,
>>>>>>> best wishes,
>>>>>>> Jeanne Wilbrandt
>>>>>>>
>>>>>>> zmb // ZFMK // University of Bonn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>>
>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.
>>>>>>>org
>>>
>
More information about the maker-devel
mailing list