[maker-devel] Further split genome questions
Carson Holt
carsonhh at gmail.com
Wed Aug 6 09:45:56 MDT 2014
Is your admin counting processes or cpu usage? Because each system call creates a separate process, so you can expect multiple processes (each system call generates a new process) but only a single cpu of usage per instance. Use different directories if you are running that many jobs. You can concatenate the separate results when your done. Use gff3_merge script to help concatenate the separate GFF3 files generated from separate jobs.
--Carson
Sent from my iPhone
> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>
>
>
> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports
> however, that the processes seem to assemble more threads than they are allowed. It is
> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why?
>
> If I start the jobs in the same directory, how can I make sure they write to the same
> directory (as, I think is required to put the pieces together in the end?)? das -basename
> take paths?
>
>
> On Wed, 6 Aug 2014 15:12:50 +0000
> Carson Holt <carsonhh at gmail.com> wrote:
>> I think the freezing is because you are starting too many simultaneous jobs. You should
>> try and use MPI to parallelize instead. The concurrent job way of doing things can
>> start to cause problems If you are running 10 or more jobs in the same directory. You
>> could try splitting them into different directories.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>>>
>>>
>>> aha, so this explains that.
>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly
>>> half of the sequences being shorter than 3,000 bp.
>>>
>>> What do you think about this weird 'I am running but not really doing
>> anything'-behavior?
>>>
>>>
>>> Thanks a lot!
>>> Jeanne
>>>
>>>
>>>
>>> On Wed, 6 Aug 2014 14:16:52 +0000
>>> Carson Holt <carsonhh at gmail.com> wrote:
>>>> If you are starting and restarting, or running multiple jobs then the log can be
>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a
>> GFF3
>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for the
>>>> contigs that have gene models. Small contigs will rarely contain models.
>>>>
>>>> --Carson
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" <j.wilbrandt at zfmk.de> wrote:
>>>>>
>>>>>
>>>>> Hi Carson,
>>>>>
>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split
>>>> into
>>>>> 20 parts, using the -g flag and the same basename.
>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally,
>>>> while
>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any
>>>>> suggestion why this is happening?
>>>>>
>>>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384
>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of
>>>> these
>>>>> only appear as FINISHED (the rest only started). There are no models for these
>>>> 'finished'
>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome
>>>> (i.e.,
>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished')
>>>>> Should this be an issue of concern?
>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look
>> good,
>>>> so
>>>>> we suspect something fishy going on...
>>>>>
>>>>> Hope you can help,
>>>>> best wishes,
>>>>> Jeanne Wilbrandt
>>>>>
>>>>> zmb // ZFMK // University of Bonn
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
More information about the maker-devel
mailing list