From stuckerta at gmail.com  Fri Dec  3 13:35:43 2021
From: stuckerta at gmail.com (Adam Stuckert)
Date: Fri, 3 Dec 2021 13:35:43 -0700
Subject: [maker-devel] Maker predicts way too few genes/proteins
Message-ID: <CAG9aC4tibu-WbyTpzspGviqZzJHp5xmn3g-1vKENaE+Tvt-pJQ@mail.gmail.com>

Hello,

I am working on annotating several different assemblies, but I am having
difficulty getting a reasonable number of predicted genes/proteins. My
annotations always predict way too few genes (thousands too few) in the
final transcript/protein fasta files. So, I am seeking help.

My approach is to annotate with EST evidence from the same species (either
straight from transcriptome assemblers or predicted coding regions from
TransDecoder) and use protein evidence from uniprot + related species.
Simple repeats are softmasked within Maker. All repeats are masked in
Maker, and I am supplying a repeat library that includes lineage-specific
repeats as well as species specific repeats that are modeled by
RepeatModeler2. I am using Maker version 3.01.03.

I'm attaching my options control file. Any help to troubleshoot this
would be greatly appreciated.

Thanks,
Adam

-- 
Adam Stuckert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211203/8c2d89e8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts
Type: application/octet-stream
Size: 4618 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211203/8c2d89e8/attachment.obj>

From s.kyungyong at berkeley.edu  Thu Dec 16 11:03:58 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Thu, 16 Dec 2021 10:03:58 -0800
Subject: [maker-devel] High memory consumption
Message-ID: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>

Hi!

MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
stuck with ~30 contigs that keep failing because of high memory
consumption. I am using mpi, running 20-30 contigs for annotation in
parallel, depending on the machine. I started with 64Gb memory machines but
have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
all memory of this machine is also saturated. It looks like tblastx is
taking lots of time and resources. The databases I have are about 200 Mb
for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
maker_opt.ctl. Would there be a way to improve this? Decreasing the number
of jobs for MPI slowed down memory saturation but eventually the same
happened.

Thank you!
Kyungyong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211216/8c2a2e68/attachment.html>

From carsonhh at gmail.com  Fri Dec 17 10:24:40 2021
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 17 Dec 2021 10:24:40 -0700
Subject: [maker-devel] cds error
In-Reply-To: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
References: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
Message-ID: <821115C7-66B4-4FE6-929B-5FFA7252A3EE@gmail.com>

You can upload your GFF3 file here, and I can take a look ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

My guess is that either gffread is not translating it correctly or you used your own GFF3 as input to maker during a run. If a gff3 submited data to MAKER has partially overlapping exons in a gene prediction, MAKER can?t fix it, and you get back whatever translation came from the gff3.

?Carson


> On Nov 28, 2021, at 9:20 PM, zc y <prometheus07.06 at gmail.com> wrote:
> 
> Dear  Maker developers,
> I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. 
> In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. 
> I used the same parameter and evidence in another rice, it don't have the problem.
> What should I do?
> 
> 
> thanks,
> <QQ??20211129121755.png><QQ??20211129121917.png>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211217/cbd90bf5/attachment.p7s>

From carsonhh at gmail.com  Fri Dec 17 10:29:38 2021
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 17 Dec 2021 10:29:38 -0700
Subject: [maker-devel] High memory consumption
In-Reply-To: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
Message-ID: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>

1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM.
2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
3. max_dna_len= should be 100000 (the default)
4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction.

?Carson


> On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu> wrote:
> 
> Hi!
> 
> MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. 
> 
> Thank you!
> Kyungyong
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211217/6507173a/attachment.p7s>

From jacques.dainat at nbis.se  Sat Dec 18 03:13:02 2021
From: jacques.dainat at nbis.se (Jacques Dainat)
Date: Sat, 18 Dec 2021 11:13:02 +0100
Subject: [maker-devel] cds error
In-Reply-To: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
References: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
Message-ID: <B93E69FD-AF4C-43B8-B913-6B02AC9EFE4A@nbis.se>

Hi,

Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that  agat_sp_extract_sequences.pl <https://github.com/NBISweden/AGAT/blob/master/bin/agat_sp_extract_sequences.pl> from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. 

Best regards, 

Jacques Dainat, Ph.D.


> On 29 Nov 2021, at 05:20, zc y <prometheus07.06 at gmail.com> wrote:
> 
> Dear  Maker developers,
> I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. 
> In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. 
> I used the same parameter and evidence in another rice, it don't have the problem.
> What should I do?
> 
> 
> thanks,
> <QQ??20211129121755.png><QQ??20211129121917.png>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211218/48e15e33/attachment.html>

From s.kyungyong at berkeley.edu  Sun Dec 19 11:02:25 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Sun, 19 Dec 2021 10:02:25 -0800
Subject: [maker-devel] High memory consumption
In-Reply-To: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
	<576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
Message-ID: <CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>

Thank you for the tips! How about reducing the time for tblastx? My cluster
has a 3 days run limit. I think what is happening is that MAKER is
terminated because of out-of-memory issues or runtime cap, and when MAKER
is restarted, tblastx needs to start from scratch. Do you think it would be
better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
3 and cpus=10 if I have 30 cores?


On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:

> 1. Make sure your system is not configured with an in memory /tmp
> directory. If it is, every file written to temporary storage will use RAM.
> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
> 3. max_dna_len= should be 100000 (the default)
> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
> 10 or 20 (there are 3 depth values you will have to set). The default is to
> keep everything, and if you have really deep alignments that can use a lot
> of RAM with out any actual benefit for gene prediction.
>
> ?Carson
>
>
>
> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
> wrote:
> >
> > Hi!
> >
> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
> stuck with ~30 contigs that keep failing because of high memory
> consumption. I am using mpi, running 20-30 contigs for annotation in
> parallel, depending on the machine. I started with 64Gb memory machines but
> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
> all memory of this machine is also saturated. It looks like tblastx is
> taking lots of time and resources. The databases I have are about 200 Mb
> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
> of jobs for MPI slowed down memory saturation but eventually the same
> happened.
> >
> > Thank you!
> > Kyungyong
> >
> >
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at yandell-lab.org
> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211219/987d9b40/attachment-0001.html>

From liu9827885 at 163.com  Tue Dec 21 02:40:41 2021
From: liu9827885 at 163.com (=?GBK?B?wfXqxQ==?=)
Date: Tue, 21 Dec 2021 17:40:41 +0800 (CST)
Subject: [maker-devel] =?gbk?q?part_long_scafflods_finished=A3=ACthe_othe?=
 =?gbk?q?rs_failed?=
Message-ID: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com>

Hello,
        I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled.
        Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs.
        In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs.
        ```
        open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1.
        --> rank=3, hostname=localhost.localdomain
       ERROR: Failed while collecting blastn reports
       ERROR: Chunk failed at level:1, tier_type:3
       FAILED CONTIG:scaffold1A


       deleted:0 hits
       ERROR: Chunk failed at level:4, tier_type:0
       FAILED CONTIG:scaffold1A
       ````
       The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below:
       ```
       STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn
       DIED    RANK    3:4:0:2
       DIED    COUNT   1
       ````
      My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too.
      Thanks for any troubleshooting tips you can offer.
Best wishes,
Yu Liu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211221/8bc057f2/attachment.html>

From s.kyungyong at berkeley.edu  Wed Dec 22 20:42:05 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Wed, 22 Dec 2021 19:42:05 -0800
Subject: [maker-devel] High memory consumption
In-Reply-To: <CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
	<576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
	<CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>
Message-ID: <CAH7WYR2NqCnzQ_WTwm2xScyMSuPrd4DXe_5TPU6zZK8xdk2Y_g@mail.gmail.com>

Hi Carson,

Looking at the progress more carefully, I learned that some query and
database combinations cause tblastx to run forever. Typically, the tblastx
search ends in reasonable times (a few hours maximum), but for those, it
takes days ( and still running ) to search the 100 kb query against a 50 Mb
database. And all CPUs are trapped by these searches, making MAKER to never
finish.

Would it be possible to skip tblastx search for these queries + databases?
I have intermediate files from a previous MAKER run produced with a smaller
size of databases, so I attempted to copy some of these files into the
current run folders. For instance,
for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the
issue,

I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous
run into the proper directory and deleted
atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir.

Then I modified run.log.child.12 to include FINISHED
SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx

However, it seems like MAKER still starts over from tblastx. I have a small
number of contigs left, so manually working around this is feasible. Would
there be a way to do this?

Thank you for your help!
Kyungyong


On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong <s.kyungyong at berkeley.edu>
wrote:

> Thank you for the tips! How about reducing the time for tblastx? My
> cluster has a 3 days run limit. I think what is happening is that MAKER is
> terminated because of out-of-memory issues or runtime cap, and when MAKER
> is restarted, tblastx needs to start from scratch. Do you think it would be
> better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
> 3 and cpus=10 if I have 30 cores?
>
>
> On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:
>
>> 1. Make sure your system is not configured with an in memory /tmp
>> directory. If it is, every file written to temporary storage will use RAM.
>> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
>> 3. max_dna_len= should be 100000 (the default)
>> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
>> 10 or 20 (there are 3 depth values you will have to set). The default is to
>> keep everything, and if you have really deep alignments that can use a lot
>> of RAM with out any actual benefit for gene prediction.
>>
>> ?Carson
>>
>>
>>
>> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
>> wrote:
>> >
>> > Hi!
>> >
>> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
>> stuck with ~30 contigs that keep failing because of high memory
>> consumption. I am using mpi, running 20-30 contigs for annotation in
>> parallel, depending on the machine. I started with 64Gb memory machines but
>> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
>> all memory of this machine is also saturated. It looks like tblastx is
>> taking lots of time and resources. The databases I have are about 200 Mb
>> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
>> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
>> of jobs for MPI slowed down memory saturation but eventually the same
>> happened.
>> >
>> > Thank you!
>> > Kyungyong
>> >
>> >
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at yandell-lab.org
>> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211222/af562e08/attachment.html>

From stuckerta at gmail.com  Fri Dec  3 13:35:43 2021
From: stuckerta at gmail.com (Adam Stuckert)
Date: Fri, 3 Dec 2021 13:35:43 -0700
Subject: [maker-devel] Maker predicts way too few genes/proteins
Message-ID: <CAG9aC4tibu-WbyTpzspGviqZzJHp5xmn3g-1vKENaE+Tvt-pJQ@mail.gmail.com>

Hello,

I am working on annotating several different assemblies, but I am having
difficulty getting a reasonable number of predicted genes/proteins. My
annotations always predict way too few genes (thousands too few) in the
final transcript/protein fasta files. So, I am seeking help.

My approach is to annotate with EST evidence from the same species (either
straight from transcriptome assemblers or predicted coding regions from
TransDecoder) and use protein evidence from uniprot + related species.
Simple repeats are softmasked within Maker. All repeats are masked in
Maker, and I am supplying a repeat library that includes lineage-specific
repeats as well as species specific repeats that are modeled by
RepeatModeler2. I am using Maker version 3.01.03.

I'm attaching my options control file. Any help to troubleshoot this
would be greatly appreciated.

Thanks,
Adam

-- 
Adam Stuckert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211203/8c2d89e8/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts
Type: application/octet-stream
Size: 4619 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211203/8c2d89e8/attachment-0002.obj>

From s.kyungyong at berkeley.edu  Thu Dec 16 11:03:58 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Thu, 16 Dec 2021 10:03:58 -0800
Subject: [maker-devel] High memory consumption
Message-ID: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>

Hi!

MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
stuck with ~30 contigs that keep failing because of high memory
consumption. I am using mpi, running 20-30 contigs for annotation in
parallel, depending on the machine. I started with 64Gb memory machines but
have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
all memory of this machine is also saturated. It looks like tblastx is
taking lots of time and resources. The databases I have are about 200 Mb
for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
maker_opt.ctl. Would there be a way to improve this? Decreasing the number
of jobs for MPI slowed down memory saturation but eventually the same
happened.

Thank you!
Kyungyong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211216/8c2a2e68/attachment-0002.html>

From carsonhh at gmail.com  Fri Dec 17 10:24:40 2021
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 17 Dec 2021 10:24:40 -0700
Subject: [maker-devel] cds error
In-Reply-To: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
References: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
Message-ID: <821115C7-66B4-4FE6-929B-5FFA7252A3EE@gmail.com>

You can upload your GFF3 file here, and I can take a look ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

My guess is that either gffread is not translating it correctly or you used your own GFF3 as input to maker during a run. If a gff3 submited data to MAKER has partially overlapping exons in a gene prediction, MAKER can?t fix it, and you get back whatever translation came from the gff3.

?Carson


> On Nov 28, 2021, at 9:20 PM, zc y <prometheus07.06 at gmail.com> wrote:
> 
> Dear  Maker developers,
> I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. 
> In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. 
> I used the same parameter and evidence in another rice, it don't have the problem.
> What should I do?
> 
> 
> thanks,
> <QQ??20211129121755.png><QQ??20211129121917.png>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211217/cbd90bf5/attachment-0002.p7s>

From carsonhh at gmail.com  Fri Dec 17 10:29:38 2021
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 17 Dec 2021 10:29:38 -0700
Subject: [maker-devel] High memory consumption
In-Reply-To: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
Message-ID: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>

1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM.
2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
3. max_dna_len= should be 100000 (the default)
4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction.

?Carson


> On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu> wrote:
> 
> Hi!
> 
> MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. 
> 
> Thank you!
> Kyungyong
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211217/6507173a/attachment-0002.p7s>

From jacques.dainat at nbis.se  Sat Dec 18 03:13:02 2021
From: jacques.dainat at nbis.se (Jacques Dainat)
Date: Sat, 18 Dec 2021 11:13:02 +0100
Subject: [maker-devel] cds error
In-Reply-To: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
References: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
Message-ID: <B93E69FD-AF4C-43B8-B913-6B02AC9EFE4A@nbis.se>

Hi,

Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that  agat_sp_extract_sequences.pl <https://github.com/NBISweden/AGAT/blob/master/bin/agat_sp_extract_sequences.pl> from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. 

Best regards, 

Jacques Dainat, Ph.D.


> On 29 Nov 2021, at 05:20, zc y <prometheus07.06 at gmail.com> wrote:
> 
> Dear  Maker developers,
> I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. 
> In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. 
> I used the same parameter and evidence in another rice, it don't have the problem.
> What should I do?
> 
> 
> thanks,
> <QQ??20211129121755.png><QQ??20211129121917.png>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211218/48e15e33/attachment-0002.html>

From s.kyungyong at berkeley.edu  Sun Dec 19 11:02:25 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Sun, 19 Dec 2021 10:02:25 -0800
Subject: [maker-devel] High memory consumption
In-Reply-To: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
	<576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
Message-ID: <CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>

Thank you for the tips! How about reducing the time for tblastx? My cluster
has a 3 days run limit. I think what is happening is that MAKER is
terminated because of out-of-memory issues or runtime cap, and when MAKER
is restarted, tblastx needs to start from scratch. Do you think it would be
better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
3 and cpus=10 if I have 30 cores?


On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:

> 1. Make sure your system is not configured with an in memory /tmp
> directory. If it is, every file written to temporary storage will use RAM.
> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
> 3. max_dna_len= should be 100000 (the default)
> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
> 10 or 20 (there are 3 depth values you will have to set). The default is to
> keep everything, and if you have really deep alignments that can use a lot
> of RAM with out any actual benefit for gene prediction.
>
> ?Carson
>
>
>
> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
> wrote:
> >
> > Hi!
> >
> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
> stuck with ~30 contigs that keep failing because of high memory
> consumption. I am using mpi, running 20-30 contigs for annotation in
> parallel, depending on the machine. I started with 64Gb memory machines but
> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
> all memory of this machine is also saturated. It looks like tblastx is
> taking lots of time and resources. The databases I have are about 200 Mb
> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
> of jobs for MPI slowed down memory saturation but eventually the same
> happened.
> >
> > Thank you!
> > Kyungyong
> >
> >
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at yandell-lab.org
> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211219/987d9b40/attachment-0002.html>

From liu9827885 at 163.com  Tue Dec 21 02:40:41 2021
From: liu9827885 at 163.com (=?GBK?B?wfXqxQ==?=)
Date: Tue, 21 Dec 2021 17:40:41 +0800 (CST)
Subject: [maker-devel] =?gbk?q?part_long_scafflods_finished=A3=ACthe_othe?=
 =?gbk?q?rs_failed?=
Message-ID: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com>

Hello,
        I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled.
        Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs.
        In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs.
        ```
        open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1.
        --> rank=3, hostname=localhost.localdomain
       ERROR: Failed while collecting blastn reports
       ERROR: Chunk failed at level:1, tier_type:3
       FAILED CONTIG:scaffold1A


       deleted:0 hits
       ERROR: Chunk failed at level:4, tier_type:0
       FAILED CONTIG:scaffold1A
       ````
       The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below:
       ```
       STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn
       DIED    RANK    3:4:0:2
       DIED    COUNT   1
       ````
      My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too.
      Thanks for any troubleshooting tips you can offer.
Best wishes,
Yu Liu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211221/8bc057f2/attachment-0002.html>

From s.kyungyong at berkeley.edu  Wed Dec 22 20:42:05 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Wed, 22 Dec 2021 19:42:05 -0800
Subject: [maker-devel] High memory consumption
In-Reply-To: <CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
	<576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
	<CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>
Message-ID: <CAH7WYR2NqCnzQ_WTwm2xScyMSuPrd4DXe_5TPU6zZK8xdk2Y_g@mail.gmail.com>

Hi Carson,

Looking at the progress more carefully, I learned that some query and
database combinations cause tblastx to run forever. Typically, the tblastx
search ends in reasonable times (a few hours maximum), but for those, it
takes days ( and still running ) to search the 100 kb query against a 50 Mb
database. And all CPUs are trapped by these searches, making MAKER to never
finish.

Would it be possible to skip tblastx search for these queries + databases?
I have intermediate files from a previous MAKER run produced with a smaller
size of databases, so I attempted to copy some of these files into the
current run folders. For instance,
for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the
issue,

I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous
run into the proper directory and deleted
atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir.

Then I modified run.log.child.12 to include FINISHED
SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx

However, it seems like MAKER still starts over from tblastx. I have a small
number of contigs left, so manually working around this is feasible. Would
there be a way to do this?

Thank you for your help!
Kyungyong


On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong <s.kyungyong at berkeley.edu>
wrote:

> Thank you for the tips! How about reducing the time for tblastx? My
> cluster has a 3 days run limit. I think what is happening is that MAKER is
> terminated because of out-of-memory issues or runtime cap, and when MAKER
> is restarted, tblastx needs to start from scratch. Do you think it would be
> better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
> 3 and cpus=10 if I have 30 cores?
>
>
> On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:
>
>> 1. Make sure your system is not configured with an in memory /tmp
>> directory. If it is, every file written to temporary storage will use RAM.
>> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
>> 3. max_dna_len= should be 100000 (the default)
>> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
>> 10 or 20 (there are 3 depth values you will have to set). The default is to
>> keep everything, and if you have really deep alignments that can use a lot
>> of RAM with out any actual benefit for gene prediction.
>>
>> ?Carson
>>
>>
>>
>> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
>> wrote:
>> >
>> > Hi!
>> >
>> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
>> stuck with ~30 contigs that keep failing because of high memory
>> consumption. I am using mpi, running 20-30 contigs for annotation in
>> parallel, depending on the machine. I started with 64Gb memory machines but
>> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
>> all memory of this machine is also saturated. It looks like tblastx is
>> taking lots of time and resources. The databases I have are about 200 Mb
>> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
>> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
>> of jobs for MPI slowed down memory saturation but eventually the same
>> happened.
>> >
>> > Thank you!
>> > Kyungyong
>> >
>> >
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at yandell-lab.org
>> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211222/af562e08/attachment-0002.html>

From stuckerta at gmail.com  Fri Dec  3 13:35:43 2021
From: stuckerta at gmail.com (Adam Stuckert)
Date: Fri, 3 Dec 2021 13:35:43 -0700
Subject: [maker-devel] Maker predicts way too few genes/proteins
Message-ID: <CAG9aC4tibu-WbyTpzspGviqZzJHp5xmn3g-1vKENaE+Tvt-pJQ@mail.gmail.com>

Hello,

I am working on annotating several different assemblies, but I am having
difficulty getting a reasonable number of predicted genes/proteins. My
annotations always predict way too few genes (thousands too few) in the
final transcript/protein fasta files. So, I am seeking help.

My approach is to annotate with EST evidence from the same species (either
straight from transcriptome assemblers or predicted coding regions from
TransDecoder) and use protein evidence from uniprot + related species.
Simple repeats are softmasked within Maker. All repeats are masked in
Maker, and I am supplying a repeat library that includes lineage-specific
repeats as well as species specific repeats that are modeled by
RepeatModeler2. I am using Maker version 3.01.03.

I'm attaching my options control file. Any help to troubleshoot this
would be greatly appreciated.

Thanks,
Adam

-- 
Adam Stuckert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211203/8c2d89e8/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maker_opts
Type: application/octet-stream
Size: 4619 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211203/8c2d89e8/attachment-0003.obj>

From s.kyungyong at berkeley.edu  Thu Dec 16 11:03:58 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Thu, 16 Dec 2021 10:03:58 -0800
Subject: [maker-devel] High memory consumption
Message-ID: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>

Hi!

MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
stuck with ~30 contigs that keep failing because of high memory
consumption. I am using mpi, running 20-30 contigs for annotation in
parallel, depending on the machine. I started with 64Gb memory machines but
have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
all memory of this machine is also saturated. It looks like tblastx is
taking lots of time and resources. The databases I have are about 200 Mb
for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
maker_opt.ctl. Would there be a way to improve this? Decreasing the number
of jobs for MPI slowed down memory saturation but eventually the same
happened.

Thank you!
Kyungyong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211216/8c2a2e68/attachment-0003.html>

From carsonhh at gmail.com  Fri Dec 17 10:24:40 2021
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 17 Dec 2021 10:24:40 -0700
Subject: [maker-devel] cds error
In-Reply-To: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
References: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
Message-ID: <821115C7-66B4-4FE6-929B-5FFA7252A3EE@gmail.com>

You can upload your GFF3 file here, and I can take a look ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi

My guess is that either gffread is not translating it correctly or you used your own GFF3 as input to maker during a run. If a gff3 submited data to MAKER has partially overlapping exons in a gene prediction, MAKER can?t fix it, and you get back whatever translation came from the gff3.

?Carson


> On Nov 28, 2021, at 9:20 PM, zc y <prometheus07.06 at gmail.com> wrote:
> 
> Dear  Maker developers,
> I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. 
> In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. 
> I used the same parameter and evidence in another rice, it don't have the problem.
> What should I do?
> 
> 
> thanks,
> <QQ??20211129121755.png><QQ??20211129121917.png>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211217/cbd90bf5/attachment-0003.p7s>

From carsonhh at gmail.com  Fri Dec 17 10:29:38 2021
From: carsonhh at gmail.com (Carson Holt)
Date: Fri, 17 Dec 2021 10:29:38 -0700
Subject: [maker-devel] High memory consumption
In-Reply-To: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
Message-ID: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>

1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM.
2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
3. max_dna_len= should be 100000 (the default)
4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction.

?Carson


> On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu> wrote:
> 
> Hi!
> 
> MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. 
> 
> Thank you!
> Kyungyong
> 
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1376 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211217/6507173a/attachment-0003.p7s>

From jacques.dainat at nbis.se  Sat Dec 18 03:13:02 2021
From: jacques.dainat at nbis.se (Jacques Dainat)
Date: Sat, 18 Dec 2021 11:13:02 +0100
Subject: [maker-devel] cds error
In-Reply-To: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
References: <CAN70h3cy1nxhih+Giu71-BDaQ_JEeDN5Vmu3rFYgd2pqJj1yDA@mail.gmail.com>
Message-ID: <B93E69FD-AF4C-43B8-B913-6B02AC9EFE4A@nbis.se>

Hi,

Might be related to fragmented CDS where the beginning is missing. The offset (phase) might be different than 0. It is 0 when there is the complete start codon. I don?t know how deals gffread with the offset. You can check within the GFF file if the incriminated genes have this kind of CDS offset start. I know that  agat_sp_extract_sequences.pl <https://github.com/NBISweden/AGAT/blob/master/bin/agat_sp_extract_sequences.pl> from AGAT (https://github.com/NBISweden/AGAT) is dealing properly with incomplete CDS and first codon with offset. You might give a try to see if it fix the issue. 

Best regards, 

Jacques Dainat, Ph.D.


> On 29 Nov 2021, at 05:20, zc y <prometheus07.06 at gmail.com> wrote:
> 
> Dear  Maker developers,
> I found a CDS error in my rice project. I ran the maker (3.01.03) and it finished without error in master_datastore_index.log. But when I use gffread to translate the protein from maker gff, I found that almost all of proteins are not start with 'M' and many stop codons in it. 
> In fact, I checked the protein file (Chr12.maker.proteins.fasta) provided by the maker, it is correct. 
> I used the same parameter and evidence in another rice, it don't have the problem.
> What should I do?
> 
> 
> thanks,
> <QQ??20211129121755.png><QQ??20211129121917.png>
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211218/48e15e33/attachment-0003.html>

From s.kyungyong at berkeley.edu  Sun Dec 19 11:02:25 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Sun, 19 Dec 2021 10:02:25 -0800
Subject: [maker-devel] High memory consumption
In-Reply-To: <576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
	<576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
Message-ID: <CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>

Thank you for the tips! How about reducing the time for tblastx? My cluster
has a 3 days run limit. I think what is happening is that MAKER is
terminated because of out-of-memory issues or runtime cap, and when MAKER
is restarted, tblastx needs to start from scratch. Do you think it would be
better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
3 and cpus=10 if I have 30 cores?


On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:

> 1. Make sure your system is not configured with an in memory /tmp
> directory. If it is, every file written to temporary storage will use RAM.
> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
> 3. max_dna_len= should be 100000 (the default)
> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
> 10 or 20 (there are 3 depth values you will have to set). The default is to
> keep everything, and if you have really deep alignments that can use a lot
> of RAM with out any actual benefit for gene prediction.
>
> ?Carson
>
>
>
> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
> wrote:
> >
> > Hi!
> >
> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
> stuck with ~30 contigs that keep failing because of high memory
> consumption. I am using mpi, running 20-30 contigs for annotation in
> parallel, depending on the machine. I started with 64Gb memory machines but
> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
> all memory of this machine is also saturated. It looks like tblastx is
> taking lots of time and resources. The databases I have are about 200 Mb
> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
> of jobs for MPI slowed down memory saturation but eventually the same
> happened.
> >
> > Thank you!
> > Kyungyong
> >
> >
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at yandell-lab.org
> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211219/987d9b40/attachment-0003.html>

From liu9827885 at 163.com  Tue Dec 21 02:40:41 2021
From: liu9827885 at 163.com (=?GBK?B?wfXqxQ==?=)
Date: Tue, 21 Dec 2021 17:40:41 +0800 (CST)
Subject: [maker-devel] =?gbk?q?part_long_scafflods_finished=A3=ACthe_othe?=
 =?gbk?q?rs_failed?=
Message-ID: <21663963.6927.17ddc5d4d68.Coremail.liu9827885@163.com>

Hello,
        I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled.
        Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02) to annotate a apple genome that consists of 17 chromosome-length scaffolds and some small contigs.
        In my various tests in running Maker, the vast majority of the smaller contigs were annotated failed. I'm not sure the long scaffolds finished rather than smaller contigs.
        ```
        open3: fork failed: Cannot allocate memory at /data/liuyu/Software/maker/bin/../lib/File/NFSLock.pm line 1037 thread 1.
        --> rank=3, hostname=localhost.localdomain
       ERROR: Failed while collecting blastn reports
       ERROR: Chunk failed at level:1, tier_type:3
       FAILED CONTIG:scaffold1A


       deleted:0 hits
       ERROR: Chunk failed at level:4, tier_type:0
       FAILED CONTIG:scaffold1A
       ````
       The work in run.log.child.0 is FINISHED. While in the theVoid.scaffold1A/run.log.child.1, the error code showed below:
       ```
       STARTED CF_hap1_part3_rnd1.maker.output/CF_hap1_part3_rnd1_datastore/67/D7/scaffold1A//theVoid.scaffold1A/0/scaffold1A.1.66268.0.db%2E1-66268%2Efor_blastn%2Efasta.blastn
       DIED    RANK    3:4:0:2
       DIED    COUNT   1
       ````
      My command is "mpiexec -n 8 /data/liuyu/Software/maker/bin/maker -base CF_hap1_part3_test part3_round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl". The mpiexec is used by MPICH(v3.3.2). And When I test and use the MPICH2, the error is same. Meanwhile, when I use less number of processes, the task is failed too.
      Thanks for any troubleshooting tips you can offer.
Best wishes,
Yu Liu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211221/8bc057f2/attachment-0003.html>

From s.kyungyong at berkeley.edu  Wed Dec 22 20:42:05 2021
From: s.kyungyong at berkeley.edu (Kyungyong Seong)
Date: Wed, 22 Dec 2021 19:42:05 -0800
Subject: [maker-devel] High memory consumption
In-Reply-To: <CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>
References: <CAH7WYR2JS1qNk+C7aUpU+jqUwYOPtnieOrZTuqF3oEZ+3KSwVQ@mail.gmail.com>
	<576A238A-603D-40FB-A210-CB8476C4E7FF@gmail.com>
	<CAH7WYR1f_OSHrhH4Ak7aYwt3TGqTnNonDEAXVLT7+Me6muD1=g@mail.gmail.com>
Message-ID: <CAH7WYR2NqCnzQ_WTwm2xScyMSuPrd4DXe_5TPU6zZK8xdk2Y_g@mail.gmail.com>

Hi Carson,

Looking at the progress more carefully, I learned that some query and
database combinations cause tblastx to run forever. Typically, the tblastx
search ends in reasonable times (a few hours maximum), but for those, it
takes days ( and still running ) to search the 100 kb query against a 50 Mb
database. And all CPUs are trapped by these searches, making MAKER to never
finish.

Would it be possible to skip tblastx search for these queries + databases?
I have intermediate files from a previous MAKER run produced with a smaller
size of databases, so I attempted to copy some of these files into the
current run folders. For instance,
for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the
issue,

I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous
run into the proper directory and deleted
atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir.

Then I modified run.log.child.12 to include FINISHED
SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx

However, it seems like MAKER still starts over from tblastx. I have a small
number of contigs left, so manually working around this is feasible. Would
there be a way to do this?

Thank you for your help!
Kyungyong


On Sun, Dec 19, 2021 at 10:02 AM Kyungyong Seong <s.kyungyong at berkeley.edu>
wrote:

> Thank you for the tips! How about reducing the time for tblastx? My
> cluster has a 3 days run limit. I think what is happening is that MAKER is
> terminated because of out-of-memory issues or runtime cap, and when MAKER
> is restarted, tblastx needs to start from scratch. Do you think it would be
> better not to use MPI and set cpus=30? Or would it be okay to set up mpi =
> 3 and cpus=10 if I have 30 cores?
>
>
> On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <carsonhh at gmail.com> wrote:
>
>> 1. Make sure your system is not configured with an in memory /tmp
>> directory. If it is, every file written to temporary storage will use RAM.
>> 2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
>> 3. max_dna_len= should be 100000 (the default)
>> 4. In maker_bopts.ctl, set all the depth_blast= options to something like
>> 10 or 20 (there are 3 depth values you will have to set). The default is to
>> keep everything, and if you have really deep alignments that can use a lot
>> of RAM with out any actual benefit for gene prediction.
>>
>> ?Carson
>>
>>
>>
>> > On Dec 16, 2021, at 11:03 AM, Kyungyong Seong <s.kyungyong at berkeley.edu>
>> wrote:
>> >
>> > Hi!
>> >
>> > MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now
>> stuck with ~30 contigs that keep failing because of high memory
>> consumption. I am using mpi, running 20-30 contigs for annotation in
>> parallel, depending on the machine. I started with 64Gb memory machines but
>> have moved up to 1.5 Tb machines as the job kept failing. Unfortunately,
>> all memory of this machine is also saturated. It looks like tblastx is
>> taking lots of time and resources. The databases I have are about 200 Mb
>> for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in
>> maker_opt.ctl. Would there be a way to improve this? Decreasing the number
>> of jobs for MPI slowed down memory saturation but eventually the same
>> happened.
>> >
>> > Thank you!
>> > Kyungyong
>> >
>> >
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at yandell-lab.org
>> > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20211222/af562e08/attachment-0003.html>