[maker-devel] High memory consumption

Carson Holt carsonhh at gmail.com
Mon Jan 3 11:23:54 MST 2022


Really the only reason to use the altest options is if you don’t have protein data, but for some reason have transcript data you want to use from a different species. If you have protein data like a previous annotation, use that instead because TBLASTX takes at least 6 times longer than BLASTP and is less sensitive. Other than that, setting depth_tblastx= in the maker_opts.ctl file.

The tblastx.temp_dir holds partial results that get merged to a tblastx file.  On failure or restart, if a tblastx.temp_dir exists, then it gets erased and rerun. If a tblastx file exists, it gets used instead of rerunning.

—Carson


> On Dec 22, 2021, at 8:42 PM, Kyungyong Seong <s.kyungyong at berkeley.edu> wrote:
> 
> Hi Carson,
> 
> Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. 
> 
> Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue,
> 
> I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted  atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. 
> 
> Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx
> 
> However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this?
> 
> Thank you for your help!
> Kyungyong
> 
> 
> 
> On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong <s.kyungyong at berkeley.edu <mailto:s.kyungyong at berkeley.edu>> wrote:
> Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores?
> 
> 
> On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
> 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM.
> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
> 3. max_dna_len= should be 100000 (the default)
> 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction.
> 
> —Carson
> 
> 
> 
> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu <mailto:s.kyungyong at berkeley.edu>> wrote:
> > 
> > Hi!
> > 
> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. 
> > 
> > Thank you!
> > Kyungyong
> > 
> > 
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>
> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org <http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20220103/4171218e/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20220103/4171218e/attachment-0002.p7s>


More information about the maker-devel mailing list