[maker-devel] High memory consumption

Kyungyong Seong s.kyungyong at berkeley.edu
Wed Dec 22 20:42:05 MST 2021


Hi Carson,

Looking at the progress more carefully, I learned that some query and
database combinations cause tblastx to run forever. Typically, the tblastx
search ends in reasonable times (a few hours maximum), but for those, it
takes days ( and still running ) to search the 100 kb query against a 50 Mb
database. And all CPUs are trapped by these searches, making MAKER to never
finish.

Would it be possible to skip tblastx search for these queries + databases?
I have intermediate files from a previous MAKER run produced with a smaller
size of databases, so I attempted to copy some of these files into the
current run folders. For instance,
for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the
issue,

I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous
run into the proper directory and deleted
atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir.

Then I modified run.log.child.12 to include FINISHED
SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx

However, it seems like MAKER still starts over from tblastx. I have a small
number of contigs left, so manually working around this is feasible. Would
there be a way to do this?

Thank you for your help!
Kyungyong



On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong <s.kyungyong at berkeley.edu>
wrote:

> Thank you for the tips! How about reducing the time for tblastx? My
> cluster has a 3 days run limit. I think what is happening is that MAKER is
> terminated because of out-of-memory issues or runtime cap, and when MAKER
> is restarted, tblastx needs to start from scratch. Do you think it would be
> better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
> 3 and cpus=10 if I have 30 cores?
>
>
> On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:
>
>> 1. Make sure your system is not configured with an in memory /tmp
>> directory. If it is, every file written to temporary storage will use RAM.
>> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
>> 3. max_dna_len= should be 100000 (the default)
>> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
>> 10 or 20 (there are 3 depth values you will have to set). The default is to
>> keep everything, and if you have really deep alignments that can use a lot
>> of RAM with out any actual benefit for gene prediction.
>>
>> —Carson
>>
>>
>>
>> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
>> wrote:
>> >
>> > Hi!
>> >
>> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
>> stuck with ~30 contigs that keep failing because of high memory
>> consumption. I am using mpi, running 20-30 contigs for annotation in
>> parallel, depending on the machine. I started with 64Gb memory machines but
>> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
>> all memory of this machine is also saturated. It looks like tblastx is
>> taking lots of time and resources. The databases I have are about 200 Mb
>> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
>> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
>> of jobs for MPI slowed down memory saturation but eventually the same
>> happened.
>> >
>> > Thank you!
>> > Kyungyong
>> >
>> >
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at yandell-lab.org
>> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211222/af562e08/attachment-0003.html>


More information about the maker-devel mailing list