From stuckerta at gmail.com Fri Dec 3 13:35:43 2021 From: stuckerta at gmail.com (Adam Stuckert) Date: Fri, 3 Dec 2021 13:35:43 -0700 Subject: [maker-devel] Maker predicts way too few genes/proteins Message-ID: Hello, I am working on annotating several different assemblies, but I am having difficulty getting a reasonable number of predicted genes/proteins. My annotations always predict way too few genes (thousands too few) in the final transcript/protein fasta files. So, I am seeking help. My approach is to annotate with EST evidence from the same species (either straight from transcriptome assemblers or predicted coding regions from TransDecoder) and use protein evidence from uniprot + related species. Simple repeats are softmasked within Maker. All repeats are masked in Maker, and I am supplying a repeat library that includes lineage-specific repeats as well as species specific repeats that are modeled by RepeatModeler2. I am using Maker version 3.01.03. I'm attaching my options control file. Any help to troubleshoot this would be greatly appreciated. Thanks, Adam -- Adam Stuckert -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts Type: application/octet-stream Size: 4618 bytes Desc: not available URL: From s.kyungyong at berkeley.edu Thu Dec 16 11:03:58 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Thu, 16 Dec 2021 10:03:58 -0800 Subject: [maker-devel] High memory consumption Message-ID: Hi! MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. Thank you! Kyungyong -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 17 10:24:40 2021 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Dec 2021 10:24:40 -0700 Subject: [maker-devel] cds error In-Reply-To: References: Message-ID: <821115C7-66B4-4FE6-929B-5FFA7252A3EE@gmail.com> You can upload your GFF3 file here, and I can take a look ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi My guess is that either gffread is not translating it correctly or you used your own GFF3 as input to maker during a run. If a gff3 submited data to MAKER has partially overlapping exons in a gene prediction, MAKER can?t fix it, and you get back whatever translation came from the gff3. ?Carson > On Nov 28, 2021, at 9:20 PM, zc y wrote: > > Dear Maker developers, > I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. > In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. > I used the same parameter and evidence in another rice, it don't have the problem. > What should I do? > > > thanks, > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Fri Dec 17 10:29:38 2021 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Dec 2021 10:29:38 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: Message-ID: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. 3. max_dna_len= should be 100000 (the default) 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. ?Carson > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong wrote: > > Hi! > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > Thank you! > Kyungyong > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From jacques.dainat at nbis.se Sat Dec 18 03:13:02 2021 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Sat, 18 Dec 2021 11:13:02 +0100 Subject: [maker-devel] cds error In-Reply-To: References: Message-ID: Hi, Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that agat_sp_extract_sequences.pl from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. Best regards, Jacques Dainat, Ph.D. > On 29 Nov 2021, at 05:20, zc y wrote: > > Dear Maker developers, > I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. > In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. > I used the same parameter and evidence in another rice, it don't have the problem. > What should I do? > > > thanks, > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Sun Dec 19 11:02:25 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Sun, 19 Dec 2021 10:02:25 -0800 Subject: [maker-devel] High memory consumption In-Reply-To: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > 1. Make sure your system is not configured with an in memory /tmp > directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like > 10 or 20 (there are 3 depth values you will have to set). The default is to > keep everything, and if you have really deep alignments that can use a lot > of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now > stuck with ~30 contigs that keep failing because of high memory > consumption. I am using mpi, running 20-30 contigs for annotation in > parallel, depending on the machine. I started with 64Gb memory machines but > have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, > all memory of this machine is also saturated. It looks like tblastx is > taking lots of time and resources. The databases I have are about 200 Mb > for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in > maker_opt.ctl. Would there be a way to improve this? Decreasing the number > of jobs for MPI slowed down memory saturation but eventually the same > happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liu9827885 at 163.com Tue Dec 21 02:40:41 2021 From: liu9827885 at 163.com (=?GBK?B?wfXqxQ==?=) Date: Tue, 21 Dec 2021 17:40:41 +0800 (CST) Subject: [maker-devel] =?gbk?q?part_long_scafflods_finished=A3=ACthe_othe?= =?gbk?q?rs_failed?= Message-ID: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs. In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs. ``` open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1. --> rank=3, hostname=localhost.localdomain ERROR: Failed while collecting blastn reports ERROR: Chunk failed at level:1, tier_type:3 FAILED CONTIG:scaffold1A deleted:0 hits ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold1A ```` The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below: ``` STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn DIED RANK 3:4:0:2 DIED COUNT 1 ```` My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too. Thanks for any troubleshooting tips you can offer. Best wishes, Yu Liu -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Wed Dec 22 20:42:05 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Wed, 22 Dec 2021 19:42:05 -0800 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: Hi Carson, Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue, I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this? Thank you for your help! Kyungyong On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong wrote: > Thank you for the tips! How about reducing the time for tblastx? My > cluster has a 3 days run limit. I think what is happening is that MAKER is > terminated because of out-of-memory issues or runtime cap, and when MAKER > is restarted, tblastx needs to start from scratch. Do you think it would be > better not to use MPI and set cpus=30? Or would it be okay to set up mpi = > 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > >> 1. Make sure your system is not configured with an in memory /tmp >> directory. If it is, every file written to temporary storage will use RAM. >> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. >> 3. max_dna_len= should be 100000 (the default) >> 4. In maker_bopts.ctl, set all the depth_blast= options to something like >> 10 or 20 (there are 3 depth values you will have to set). The default is to >> keep everything, and if you have really deep alignments that can use a lot >> of RAM with out any actual benefit for gene prediction. >> >> ?Carson >> >> >> >> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong >> wrote: >> > >> > Hi! >> > >> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now >> stuck with ~30 contigs that keep failing because of high memory >> consumption. I am using mpi, running 20-30 contigs for annotation in >> parallel, depending on the machine. I started with 64Gb memory machines but >> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, >> all memory of this machine is also saturated. It looks like tblastx is >> taking lots of time and resources. The databases I have are about 200 Mb >> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in >> maker_opt.ctl. Would there be a way to improve this? Decreasing the number >> of jobs for MPI slowed down memory saturation but eventually the same >> happened. >> > >> > Thank you! >> > Kyungyong >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at yandell-lab.org >> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuckerta at gmail.com Fri Dec 3 13:35:43 2021 From: stuckerta at gmail.com (Adam Stuckert) Date: Fri, 3 Dec 2021 13:35:43 -0700 Subject: [maker-devel] Maker predicts way too few genes/proteins Message-ID: Hello, I am working on annotating several different assemblies, but I am having difficulty getting a reasonable number of predicted genes/proteins. My annotations always predict way too few genes (thousands too few) in the final transcript/protein fasta files. So, I am seeking help. My approach is to annotate with EST evidence from the same species (either straight from transcriptome assemblers or predicted coding regions from TransDecoder) and use protein evidence from uniprot + related species. Simple repeats are softmasked within Maker. All repeats are masked in Maker, and I am supplying a repeat library that includes lineage-specific repeats as well as species specific repeats that are modeled by RepeatModeler2. I am using Maker version 3.01.03. I'm attaching my options control file. Any help to troubleshoot this would be greatly appreciated. Thanks, Adam -- Adam Stuckert -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts Type: application/octet-stream Size: 4619 bytes Desc: not available URL: From s.kyungyong at berkeley.edu Thu Dec 16 11:03:58 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Thu, 16 Dec 2021 10:03:58 -0800 Subject: [maker-devel] High memory consumption Message-ID: Hi! MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. Thank you! Kyungyong -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 17 10:24:40 2021 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Dec 2021 10:24:40 -0700 Subject: [maker-devel] cds error In-Reply-To: References: Message-ID: <821115C7-66B4-4FE6-929B-5FFA7252A3EE@gmail.com> You can upload your GFF3 file here, and I can take a look ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi My guess is that either gffread is not translating it correctly or you used your own GFF3 as input to maker during a run. If a gff3 submited data to MAKER has partially overlapping exons in a gene prediction, MAKER can?t fix it, and you get back whatever translation came from the gff3. ?Carson > On Nov 28, 2021, at 9:20 PM, zc y wrote: > > Dear Maker developers, > I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. > In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. > I used the same parameter and evidence in another rice, it don't have the problem. > What should I do? > > > thanks, > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Fri Dec 17 10:29:38 2021 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Dec 2021 10:29:38 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: Message-ID: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. 3. max_dna_len= should be 100000 (the default) 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. ?Carson > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong wrote: > > Hi! > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > Thank you! > Kyungyong > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From jacques.dainat at nbis.se Sat Dec 18 03:13:02 2021 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Sat, 18 Dec 2021 11:13:02 +0100 Subject: [maker-devel] cds error In-Reply-To: References: Message-ID: Hi, Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that agat_sp_extract_sequences.pl from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. Best regards, Jacques Dainat, Ph.D. > On 29 Nov 2021, at 05:20, zc y wrote: > > Dear Maker developers, > I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. > In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. > I used the same parameter and evidence in another rice, it don't have the problem. > What should I do? > > > thanks, > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Sun Dec 19 11:02:25 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Sun, 19 Dec 2021 10:02:25 -0800 Subject: [maker-devel] High memory consumption In-Reply-To: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > 1. Make sure your system is not configured with an in memory /tmp > directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like > 10 or 20 (there are 3 depth values you will have to set). The default is to > keep everything, and if you have really deep alignments that can use a lot > of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now > stuck with ~30 contigs that keep failing because of high memory > consumption. I am using mpi, running 20-30 contigs for annotation in > parallel, depending on the machine. I started with 64Gb memory machines but > have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, > all memory of this machine is also saturated. It looks like tblastx is > taking lots of time and resources. The databases I have are about 200 Mb > for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in > maker_opt.ctl. Would there be a way to improve this? Decreasing the number > of jobs for MPI slowed down memory saturation but eventually the same > happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liu9827885 at 163.com Tue Dec 21 02:40:41 2021 From: liu9827885 at 163.com (=?GBK?B?wfXqxQ==?=) Date: Tue, 21 Dec 2021 17:40:41 +0800 (CST) Subject: [maker-devel] =?gbk?q?part_long_scafflods_finished=A3=ACthe_othe?= =?gbk?q?rs_failed?= Message-ID: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs. In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs. ``` open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1. --> rank=3, hostname=localhost.localdomain ERROR: Failed while collecting blastn reports ERROR: Chunk failed at level:1, tier_type:3 FAILED CONTIG:scaffold1A deleted:0 hits ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold1A ```` The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below: ``` STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn DIED RANK 3:4:0:2 DIED COUNT 1 ```` My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too. Thanks for any troubleshooting tips you can offer. Best wishes, Yu Liu -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Wed Dec 22 20:42:05 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Wed, 22 Dec 2021 19:42:05 -0800 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: Hi Carson, Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue, I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this? Thank you for your help! Kyungyong On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong wrote: > Thank you for the tips! How about reducing the time for tblastx? My > cluster has a 3 days run limit. I think what is happening is that MAKER is > terminated because of out-of-memory issues or runtime cap, and when MAKER > is restarted, tblastx needs to start from scratch. Do you think it would be > better not to use MPI and set cpus=30? Or would it be okay to set up mpi = > 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > >> 1. Make sure your system is not configured with an in memory /tmp >> directory. If it is, every file written to temporary storage will use RAM. >> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. >> 3. max_dna_len= should be 100000 (the default) >> 4. In maker_bopts.ctl, set all the depth_blast= options to something like >> 10 or 20 (there are 3 depth values you will have to set). The default is to >> keep everything, and if you have really deep alignments that can use a lot >> of RAM with out any actual benefit for gene prediction. >> >> ?Carson >> >> >> >> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong >> wrote: >> > >> > Hi! >> > >> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now >> stuck with ~30 contigs that keep failing because of high memory >> consumption. I am using mpi, running 20-30 contigs for annotation in >> parallel, depending on the machine. I started with 64Gb memory machines but >> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, >> all memory of this machine is also saturated. It looks like tblastx is >> taking lots of time and resources. The databases I have are about 200 Mb >> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in >> maker_opt.ctl. Would there be a way to improve this? Decreasing the number >> of jobs for MPI slowed down memory saturation but eventually the same >> happened. >> > >> > Thank you! >> > Kyungyong >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at yandell-lab.org >> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuckerta at gmail.com Fri Dec 3 13:35:43 2021 From: stuckerta at gmail.com (Adam Stuckert) Date: Fri, 3 Dec 2021 13:35:43 -0700 Subject: [maker-devel] Maker predicts way too few genes/proteins Message-ID: Hello, I am working on annotating several different assemblies, but I am having difficulty getting a reasonable number of predicted genes/proteins. My annotations always predict way too few genes (thousands too few) in the final transcript/protein fasta files. So, I am seeking help. My approach is to annotate with EST evidence from the same species (either straight from transcriptome assemblers or predicted coding regions from TransDecoder) and use protein evidence from uniprot + related species. Simple repeats are softmasked within Maker. All repeats are masked in Maker, and I am supplying a repeat library that includes lineage-specific repeats as well as species specific repeats that are modeled by RepeatModeler2. I am using Maker version 3.01.03. I'm attaching my options control file. Any help to troubleshoot this would be greatly appreciated. Thanks, Adam -- Adam Stuckert -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts Type: application/octet-stream Size: 4619 bytes Desc: not available URL: From s.kyungyong at berkeley.edu Thu Dec 16 11:03:58 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Thu, 16 Dec 2021 10:03:58 -0800 Subject: [maker-devel] High memory consumption Message-ID: Hi! MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. Thank you! Kyungyong -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 17 10:24:40 2021 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Dec 2021 10:24:40 -0700 Subject: [maker-devel] cds error In-Reply-To: References: Message-ID: <821115C7-66B4-4FE6-929B-5FFA7252A3EE@gmail.com> You can upload your GFF3 file here, and I can take a look ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi My guess is that either gffread is not translating it correctly or you used your own GFF3 as input to maker during a run. If a gff3 submited data to MAKER has partially overlapping exons in a gene prediction, MAKER can?t fix it, and you get back whatever translation came from the gff3. ?Carson > On Nov 28, 2021, at 9:20 PM, zc y wrote: > > Dear Maker developers, > I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. > In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. > I used the same parameter and evidence in another rice, it don't have the problem. > What should I do? > > > thanks, > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From carsonhh at gmail.com Fri Dec 17 10:29:38 2021 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 17 Dec 2021 10:29:38 -0700 Subject: [maker-devel] High memory consumption In-Reply-To: References: Message-ID: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> 1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM. 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. 3. max_dna_len= should be 100000 (the default) 4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction. ?Carson > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong wrote: > > Hi! > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. > > Thank you! > Kyungyong > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1376 bytes Desc: not available URL: From jacques.dainat at nbis.se Sat Dec 18 03:13:02 2021 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Sat, 18 Dec 2021 11:13:02 +0100 Subject: [maker-devel] cds error In-Reply-To: References: Message-ID: Hi, Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that agat_sp_extract_sequences.pl from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. Best regards, Jacques Dainat, Ph.D. > On 29 Nov 2021, at 05:20, zc y wrote: > > Dear Maker developers, > I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. > In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. > I used the same parameter and evidence in another rice, it don't have the problem. > What should I do? > > > thanks, > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Sun Dec 19 11:02:25 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Sun, 19 Dec 2021 10:02:25 -0800 Subject: [maker-devel] High memory consumption In-Reply-To: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores? On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > 1. Make sure your system is not configured with an in memory /tmp > directory. If it is, every file written to temporary storage will use RAM. > 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. > 3. max_dna_len= should be 100000 (the default) > 4. In maker_bopts.ctl, set all the depth_blast= options to something like > 10 or 20 (there are 3 depth values you will have to set). The default is to > keep everything, and if you have really deep alignments that can use a lot > of RAM with out any actual benefit for gene prediction. > > ?Carson > > > > > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong > wrote: > > > > Hi! > > > > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now > stuck with ~30 contigs that keep failing because of high memory > consumption. I am using mpi, running 20-30 contigs for annotation in > parallel, depending on the machine. I started with 64Gb memory machines but > have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, > all memory of this machine is also saturated. It looks like tblastx is > taking lots of time and resources. The databases I have are about 200 Mb > for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in > maker_opt.ctl. Would there be a way to improve this? Decreasing the number > of jobs for MPI slowed down memory saturation but eventually the same > happened. > > > > Thank you! > > Kyungyong > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at yandell-lab.org > > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liu9827885 at 163.com Tue Dec 21 02:40:41 2021 From: liu9827885 at 163.com (=?GBK?B?wfXqxQ==?=) Date: Tue, 21 Dec 2021 17:40:41 +0800 (CST) Subject: [maker-devel] =?gbk?q?part_long_scafflods_finished=A3=ACthe_othe?= =?gbk?q?rs_failed?= Message-ID: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com> Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs. In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs. ``` open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1. --> rank=3, hostname=localhost.localdomain ERROR: Failed while collecting blastn reports ERROR: Chunk failed at level:1, tier_type:3 FAILED CONTIG:scaffold1A deleted:0 hits ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold1A ```` The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below: ``` STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn DIED RANK 3:4:0:2 DIED COUNT 1 ```` My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too. Thanks for any troubleshooting tips you can offer. Best wishes, Yu Liu -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kyungyong at berkeley.edu Wed Dec 22 20:42:05 2021 From: s.kyungyong at berkeley.edu (Kyungyong Seong) Date: Wed, 22 Dec 2021 19:42:05 -0800 Subject: [maker-devel] High memory consumption In-Reply-To: References: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com> Message-ID: Hi Carson, Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue, I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this? Thank you for your help! Kyungyong On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong wrote: > Thank you for the tips! How about reducing the time for tblastx? My > cluster has a 3 days run limit. I think what is happening is that MAKER is > terminated because of out-of-memory issues or runtime cap, and when MAKER > is restarted, tblastx needs to start from scratch. Do you think it would be > better not to use MPI and set cpus=30? Or would it be okay to set up mpi = > 3 and cpus=10 if I have 30 cores? > > > On Fri, Dec 17, 2021 at 9:29 AM Carson Holt wrote: > >> 1. Make sure your system is not configured with an in memory /tmp >> directory. If it is, every file written to temporary storage will use RAM. >> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1. >> 3. max_dna_len= should be 100000 (the default) >> 4. In maker_bopts.ctl, set all the depth_blast= options to something like >> 10 or 20 (there are 3 depth values you will have to set). The default is to >> keep everything, and if you have really deep alignments that can use a lot >> of RAM with out any actual benefit for gene prediction. >> >> ?Carson >> >> >> >> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong >> wrote: >> > >> > Hi! >> > >> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now >> stuck with ~30 contigs that keep failing because of high memory >> consumption. I am using mpi, running 20-30 contigs for annotation in >> parallel, depending on the machine. I started with 64Gb memory machines but >> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, >> all memory of this machine is also saturated. It looks like tblastx is >> taking lots of time and resources. The databases I have are about 200 Mb >> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in >> maker_opt.ctl. Would there be a way to improve this? Decreasing the number >> of jobs for MPI slowed down memory saturation but eventually the same >> happened. >> > >> > Thank you! >> > Kyungyong >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at yandell-lab.org >> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: