[maker-devel] Some errors reported by Maker2

Wed Sep 6 09:51:54 MDT 2017

Dear Carson:

(1) Thank you for your explanation. I will try to set max_dna_len as 400kb
for our rodent species, which is a little bit higher than the suggested
value for large vertebrate genome (in the maker manual it mentioned
"300,000 is a good max_dna_len on large vertebrate genomes if memory is not
a limiting factor").

(2) By reading some of your replies in the maker google group, and I
noticed that it can reduce memory and save time for annotation if I set
depth_blast to a certain number. So I changed the following parameters. But
I wonder, whether it will decrease the quality of annotation? If it won't
affect the quality, can I even use a smaller number (e.g., 20) to save more
memory and time?

depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking

(3) I also have some concerns about the speed, especially for the long
scaffolds (around 100Mb). I wonder which part is the most time consuming
for genome annotation (repeat masking, blast, or polishing?).
Particularly, I wonder whether the blastx of protein evidence will take
majority of time. Now, I have prepared 99k mammalian Swiss protein
sequences and 340k rodent TrEMBL protein sequences as protein evidences. I
am considering whether I can save much time if I only use the 99k mammalian
Swiss protein sequences as evidences.

(4) For some reasons, I can not run maker though MPI on our cluster. So I
can only start multiple maker. I wonder if it is possible to let multiple
maker to annotate the same long scaffold (i.e., for a single sequence I
start multiple maker, without splitting the long sequence into shorter
ones).

(5) Still about the speed issue. I read some of your comments about "cpus"
parameters in the maker_opts file (
http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html).
And I know it indicate the number of cpus for a single chunk. So if I set
"cpus=2" in the maker_opts file, then I can use the following command to
submit the job, right?

**************** the bash file used to submit the maker job
#!/bin/bash

#$ -cwd
#$ -S /bin/bash
#$ -j y
#$ -N makerT2
#$ -l h_vmem=8g
#$ -pe smp 2

module load MAKER/2.31.9/perl.5.22.1

maker --q 2> maker_test.error

Many thanks

Best
Qaunwei

2017-09-05 18:08 GMT-04:00 Carson Holt <carsonhh at gmail.com>:

> max_dna_len is the window size for keeping data in RAM. Smaller values do
> not split genes. But values lower than 100kb can create issues (if a single
> gene models spans 3 or more windows, it creates a weird failure).
>
> —Carson
>
>
>
>
> On Sep 5, 2017, at 4:04 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Dear Carson:
>
> Thanks. I wonder whether smaller "max_dna_len" will split longer
> scaffolds. I set max_dna_len as 1Mb, because there are quite many long
> scaffolds (e.g., the longest one is about 100Mb). Would you explain whether
> smaller "max_dna_len" will decrease the quality of annotation (e.g., split
> some genes in the same scaffold)?
>
>
> Best
> Quanwei
>
> 2017-09-05 17:48 GMT-04:00 Carson Holt <carsonhh at gmail.com>:
>
>> You ran out of memory. You probably set max_dna_len too high for the
>> machines you are using. There is a note in the maker_opts.ctl file that
>> tells you that this value affects memory usage.
>>
>> So you can either set it lower, or if running under MPI, use fewer CPUs
>> per node (how you do this is MPI flavor dependent, but some flavors let you
>> do this by setting process count lower combined with the round robin
>> option).
>>
>> —Carson
>>
>>
>>
>> On Sep 5, 2017, at 2:24 PM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>>
>> Hello:
>>
>> We are doing genome annotation for a new rodent species. We have finished
>> the training of the ab initio gene predictors successful by setting the
>> following parameters (split_hit=40000, max_dna_len=1000000, and 99k
>> mammalian Swiss protein sequences as evidences.
>>
>> But when I used the trained model to do the genome annotation, I got the
>> following kinds of errors (shown in red). I used the same parameters as
>> those for training, except for addition of 340k rodent TrEMBL protein
>> sequences for protein evidences (i.e., I use both 99k mammalian Swiss
>> protein sequences and 340k rodent TrEMBL protein sequences).
>>
>> I am doing the annotation on a cluster and started multiple Maker in the
>> same directory (I had tried to use MPI but met some problems).
>>
>> Do you have any suggestions? Many thanks
>> #some kinds of errors
>> open3: fork failed: Cannot allocate memory at
>> /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Widget/blastx.pm line 40.
>> --> rank=NA, hostname=n520
>> ERROR: Failed while doing blastx of proteins
>> ERROR: Chunk failed at level:8, tier_type:3
>> FAILED CONTIG:Contig2
>>
>>
>> setting up GFF3 output and fasta chunks
>> doing repeat masking
>> Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm
>> line 1050.
>> --> rank=NA, hostname=n513
>> ERROR: Failed while doing repeat masking
>> ERROR: Chunk failed at level:0, tier_type:1
>> FAILED CONTIG:Contig12378
>>
>>
>> Best
>> Quanwei
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170906/5ef9f187/attachment-0003.html>