From carsonhh at gmail.com Mon Jan 3 11:23:54 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 11:23:54 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: <89CF2C58-9F93-4696-AA14-E2F25E663849@gmail.com> Really the only reason to use the altest options is if you don?t have protein data, but for some reason have transcript data you want to use from a different species. If you have protein data like a previous annotation, use that instead because TBLASTX takes at least 6 times longer than BLASTP and is less sensitive. Other than that, setting depth_tblastx= in the maker_opts.ctl file. The tblastx.temp_dir holds partial results that get merged to a tblastx file. On failure or restart, if a tblastx.temp_dir exists, then it gets erased and rerun. If a tblastx file exists, it gets used instead of rerunning. ?Carson > On Dec 22, 2021, at 8:42 PM, Kyungyong Seong wrote: > > Hi Carson, > > Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. > > Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue, > > I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. > > Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx > > However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this? > > Thank you for your help! > Kyungyong > > > > On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong > wrote: > Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt > wrote: > 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jan 3 11:29:35 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 11:29:35 -0700 Subject: [maker-devel] =?utf-8?q?part_long_scafflods_finished=EF=BC=8Cthe?= =?utf-8?q?_others_failed?= In-Reply-To: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> References: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> Message-ID: <73C03865-3CA5-4728-86CA-C244415441C0@gmail.com> You are out of RAM. If running on a scheduler like SLURM or PBS, depending on how you submit the job and how the scheduler has been configured, you may not have access to the whole machine?s RAM (you may need to request an amount of RAM). Also make sure /tmp or the location you specify as TMP= is not an in memory directory (not uncommon on some configurations). On the cluster where I work for example /tmp defaults to in memory, and real disk temporary space is at /scratch/local, so I must specify that detail in the maker_opts.ctl file in the TMP= value. Other things to look at ?> 1. If running under MPI, cpu= in maker_opts.ctl must be set to 1. 2. max_dna_len= should be 100000 (the default) 3. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. ?Carson > On Dec 21, 2021, at 2:40 AM, ?? wrote: > > Hello, > I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. > Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs. > In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs. > ``` > open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1. > --> rank=3, hostname=localhost.localdomain > ERROR: Failed while collecting blastn reports > ERROR: Chunk failed at level:1, tier_type:3 > FAILED CONTIG:scaffold1A > > deleted:0 hits > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:scaffold1A > ```` > The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below: > ``` > STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn > DIED RANK 3:4:0:2 > DIED COUNT 1 > ```` > My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too. > Thanks for any troubleshooting tips you can offer. > Best wishes, > Yu Liu > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jan 3 12:53:29 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 12:53:29 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: <487214DE-B4E6-4F53-B38C-97438B791433@gmail.com> You can try mpi = 3 and cpus=10. It might recuce memory usage. ?Carson > On Dec 19, 2021, at 11:02 AM, Kyungyong Seong wrote: > > Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt > wrote: > 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jan 3 12:59:13 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 12:59:13 -0700 Subject: [maker-devel] Maker predicts way too few genes/proteins In-Reply-To: References: Message-ID: Make sure you are training either SNAP or Augustus and not just protein2genome or est2genome. You can also rescue rejected Augustus/SNAP models by running them through InterProScan and then giving any model that contains an InterPro domain to the model_gff= option. BASIC PROTOCOL 5 from this protocols paper ?> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/ ?Carson > On Dec 3, 2021, at 1:35 PM, Adam Stuckert wrote: > > Hello, > > I am working on annotating several different assemblies, but I am having difficulty getting a reasonable number of predicted genes/proteins. My annotations always predict way too few genes (thousands too few) in the final transcript/protein fasta files. So, I am seeking help. > > My approach is to annotate with EST evidence from the same species (either straight from transcriptome assemblers or predicted coding regions from TransDecoder) and use protein evidence from uniprot + related species. Simple repeats are softmasked within Maker. All repeats are masked in Maker, and I am supplying a repeat library that includes lineage-specific repeats as well as species specific repeats that are modeled by RepeatModeler2. I am using Maker version 3.01.03. > > I'm attaching my options control file. Any help to troubleshoot this would be greatly appreciated. > > Thanks, > Adam > > -- > Adam Stuckert > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From robert.king at rothamsted.ac.uk Wed Jan 12 05:40:46 2022 From: robert.king at rothamsted.ac.uk (Robert King) Date: Wed, 12 Jan 2022 12:40:46 +0000 Subject: [maker-devel] cds error In-Reply-To: References:

Message-ID: If change second from last column to just . then that usually fixes this kind of problem or try script indicated. Gffread with x parameter and not w right.. From: maker-devel On Behalf Of Jacques Dainat Sent: 18 December 2021 10:13 To: zc y Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] cds error CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe. Hi, Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that agat_sp_extract_sequences.pl from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. Best regards, Jacques Dainat, Ph.D. On 29 Nov 2021, at 05:20, zc y > wrote: Dear Maker developers, I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. I used the same parameter and evidence in another rice, it don't have the problem. What should I do? thanks, _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038. -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Mon Jan 3 13:09:12 2022 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Mon, 03 Jan 2022 20:09:12 -0000 Subject: [maker-devel] High memory consumption In-Reply-To: <487214DE-B4E6-4F53-B38C-97438B791433@gmail.com> References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> <487214DE-B4E6-4F53-B38C-97438B791433@gmail.com> Message-ID: I found out that high memory consumption was caused by BLAST or exonerate on incompletely masked repetitive contigs vs. a few annotated repetitive elements in the public annotation data. There were numerous possible matches, and BLAST or exonerate was trying to find all of those, increasing the running time and memory usage. I was eventually able to get away from this. Thanks for your suggestions! On Mon, Jan 3, 2022 at 11:53 AM Carson Holt wrote: > You can try mpi = 3 and cpus=10. It might recuce memory usage. > > ?Carson > > > On Dec 19, 2021, at 11:02 AM, Kyungyong Seong > wrote: > > Thank you for the tips! How about reducing the time for tblastx? My > cluster has a 3 days run limit. I think what is happening is that MAKER is > terminated because of out-of-memory issues or runtime cap, and when MAKER > is restarted, tblastx needs to start from scratch. Do you think it would be > better not to use MPI and set cpus=30? Or would it be okay to set up mpi = > 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > >> 1. Make sure your system is not configured with an in memory /tmp >> directory. If it is, every file written to temporary storage will use RAM. >> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. >> 3. max_dna_len= should be 100000 (the default) >> 4. In maker_bopts.ctl, set all the depth_blast= options to something like >> 10 or 20 (there are 3 depth values you will have to set). The default is to >> keep everything, and if you have really deep alignments that can use a lot >> of RAM with out any actual benefit for gene prediction. >> >> ?Carson >> >> >> >> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong >> wrote: >> > >> > Hi! >> > >> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now >> stuck with ~30 contigs that keep failing because of high memory >> consumption. I am using mpi, running 20-30 contigs for annotation in >> parallel, depending on the machine. I started with 64Gb memory machines but >> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, >> all memory of this machine is also saturated. It looks like tblastx is >> taking lots of time and resources. The databases I have are about 200 Mb >> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in >> maker_opt.ctl. Would there be a way to improve this? Decreasing the number >> of jobs for MPI slowed down memory saturation but eventually the same >> happened. >> > >> > Thank you! >> > Kyungyong >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at yandell-lab.org >> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sven.Winter at vetmeduni.ac.at Thu Jan 20 06:52:00 2022 From: Sven.Winter at vetmeduni.ac.at (Winter Sven) Date: Thu, 20 Jan 2022 13:52:00 +0000 Subject: [maker-devel] maker conda installation Message-ID: Dear yandell-lab team, I found a conda version of maker and tried to install it. However, I receive the following error that I cannot resolve: UnsatisfiableError: The following specifications were found to be incompatible with each other: Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system: - feature:/linux-64::__glibc==2.23=0 - feature:|@/linux-64::__glibc==2.23=0 Your installed version is: 2.23 Is the conda maker installation actually supported from your lab and do you maybe know what the issue is? I find it very difficult to install maker from the source code and conda would be so much easier i fit would only work. Thank you very much in advance ! Dr.?Sven Winter Postdoctoral Researcher Research Institute of Wildlife Ecology Vetmeduni? Savoyenstrasse 1 1160 Vienna, Austria Member oft the IUCN SSC Giraffe and Okapi Specialist Group From barry.moore at genetics.utah.edu Sun Jan 23 16:24:47 2022 From: barry.moore at genetics.utah.edu (Marvin B Moore) Date: Sun, 23 Jan 2022 23:24:47 +0000 Subject: [maker-devel] maker conda installation In-Reply-To: References: Message-ID: Hi Sven, Sorry, the conda package was not created and is not maintained by the Yandell lab. Regards, Barry From: maker-devel on behalf of Winter Sven Date: Friday, January 21, 2022 at 3:25 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker conda installation Dear yandell-lab team, I found a conda version of maker and tried to install it. However, I receive the following error that I cannot resolve: UnsatisfiableError: The following specifications were found to be incompatible with each other: Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system: - feature:/linux-64::__glibc==2.23=0 - feature:|@/linux-64::__glibc==2.23=0 Your installed version is: 2.23 Is the conda maker installation actually supported from your lab and do you maybe know what the issue is? I find it very difficult to install maker from the source code and conda would be so much easier i fit would only work. Thank you very much in advance ! Dr. Sven Winter Postdoctoral Researcher Research Institute of Wildlife Ecology Vetmeduni Savoyenstrasse 1 1160 Vienna, Austria Member oft the IUCN SSC Giraffe and Okapi Specialist Group _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From upendrarajbhattarai at gmail.com Sat Jan 22 14:03:02 2022 From: upendrarajbhattarai at gmail.com (Upendra Bhattarai) Date: Sun, 23 Jan 2022 10:03:02 +1300 Subject: [maker-devel] How to solve Non-unique top level ID error in Maker annotation pipeline ? Message-ID: Hi, I am trying to run a second round of Maker annotation job with a SNAP trained file. However, when I try to pass maker_gff= maker.all.gff.file.from.the.first.round.gff. I get Non-unique top level ID errors for all the scaffolds. maker_round_1_master_datastore_index.log shows failed reports for all the scaffolds. I tried gff3_merge with and without -l flag, both gff3 files ended up giving the same error. I also tried gaas_maker_merge_outputs_from_datastore.pl and used maker_mix.gff file for maker_gff. It again fails with the same error. When I grep non-unique id. It shows two hits one in `match` and the other in `match_part`. I saw some tutorials not passing maker_gff on the second round of the maker. But when I do that number of gene models decreases. I have posted my question in Biostar (https://www.biostars.org/p/9506971/), with the maker_opts.ctl and error messages. Can someone help me, please? Thank you, Upendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Jan 3 11:23:54 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 11:23:54 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: <89CF2C58-9F93-4696-AA14-E2F25E663849@gmail.com> Really the only reason to use the altest options is if you don?t have protein data, but for some reason have transcript data you want to use from a different species. If you have protein data like a previous annotation, use that instead because TBLASTX takes at least 6 times longer than BLASTP and is less sensitive. Other than that, setting depth_tblastx= in the maker_opts.ctl file. The tblastx.temp_dir holds partial results that get merged to a tblastx file. On failure or restart, if a tblastx.temp_dir exists, then it gets erased and rerun. If a tblastx file exists, it gets used instead of rerunning. ?Carson > On Dec 22, 2021, at 8:42 PM, Kyungyong Seong wrote: > > Hi Carson, > > Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. > > Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue, > > I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. > > Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx > > However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this? > > Thank you for your help! > Kyungyong > > > > On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong > wrote: > Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt > wrote: > 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jan 3 11:29:35 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 11:29:35 -0700 Subject: [maker-devel] =?utf-8?q?part_long_scafflods_finished=EF=BC=8Cthe?= =?utf-8?q?_others_failed?= In-Reply-To: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> References: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> Message-ID: <73C03865-3CA5-4728-86CA-C244415441C0@gmail.com> You are out of RAM. If running on a scheduler like SLURM or PBS, depending on how you submit the job and how the scheduler has been configured, you may not have access to the whole machine?s RAM (you may need to request an amount of RAM). Also make sure /tmp or the location you specify as TMP= is not an in memory directory (not uncommon on some configurations). On the cluster where I work for example /tmp defaults to in memory, and real disk temporary space is at /scratch/local, so I must specify that detail in the maker_opts.ctl file in the TMP= value. Other things to look at ?> 1. If running under MPI, cpu= in maker_opts.ctl must be set to 1. 2. max_dna_len= should be 100000 (the default) 3. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. ?Carson > On Dec 21, 2021, at 2:40 AM, ?? wrote: > > Hello, > I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. > Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs. > In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs. > ``` > open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1. > --> rank=3, hostname=localhost.localdomain > ERROR: Failed while collecting blastn reports > ERROR: Chunk failed at level:1, tier_type:3 > FAILED CONTIG:scaffold1A > > deleted:0 hits > ERROR: Chunk failed at level:4, tier_type:0 > FAILED CONTIG:scaffold1A > ```` > The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below: > ``` > STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn > DIED RANK 3:4:0:2 > DIED COUNT 1 > ```` > My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too. > Thanks for any troubleshooting tips you can offer. > Best wishes, > Yu Liu > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jan 3 12:53:29 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 12:53:29 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: <487214DE-B4E6-4F53-B38C-97438B791433@gmail.com> You can try mpi = 3 and cpus=10. It might recuce memory usage. ?Carson > On Dec 19, 2021, at 11:02 AM, Kyungyong Seong wrote: > > Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt > wrote: > 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Mon Jan 3 12:59:13 2022 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 3 Jan 2022 12:59:13 -0700 Subject: [maker-devel] Maker predicts way too few genes/proteins In-Reply-To: References: Message-ID: Make sure you are training either SNAP or Augustus and not just protein2genome or est2genome. You can also rescue rejected Augustus/SNAP models by running them through InterProScan and then giving any model that contains an InterPro domain to the model_gff= option. BASIC PROTOCOL 5 from this protocols paper ?> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286374/ ?Carson > On Dec 3, 2021, at 1:35 PM, Adam Stuckert wrote: > > Hello, > > I am working on annotating several different assemblies, but I am having difficulty getting a reasonable number of predicted genes/proteins. My annotations always predict way too few genes (thousands too few) in the final transcript/protein fasta files. So, I am seeking help. > > My approach is to annotate with EST evidence from the same species (either straight from transcriptome assemblers or predicted coding regions from TransDecoder) and use protein evidence from uniprot + related species. Simple repeats are softmasked within Maker. All repeats are masked in Maker, and I am supplying a repeat library that includes lineage-specific repeats as well as species specific repeats that are modeled by RepeatModeler2. I am using Maker version 3.01.03. > > I'm attaching my options control file. Any help to troubleshoot this would be greatly appreciated. > > Thanks, > Adam > > -- > Adam Stuckert > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From robert.king at rothamsted.ac.uk Wed Jan 12 05:40:46 2022 From: robert.king at rothamsted.ac.uk (Robert King) Date: Wed, 12 Jan 2022 12:40:46 +0000 Subject: [maker-devel] cds error In-Reply-To: References: