[maker-devel] Some errors reported by Maker2

Mon Sep 11 11:12:29 MDT 2017

Dear Carson:

I only run 5 Maker instances in each directory (and set cpus=2). If it is
related to memory issue or an IO issue, I am not sure why the much longer
scaffolds (than the failed ones) were all annotated successfully, but the
relatively shorter ones failed.

I have set "tries=5" (#number of times to try a contig if there is a
failure for some reason). I will try "clean_try=1" and test on the failed
scaffolds individually with larger memory to see whether they can be
annotated.

Thank you!

Best
Quanwei

2017-09-11 13:07 GMT-04:00 Carson Holt <carsonhh at gmail.com>:

> I think the cause of the error may have been a little further upstream
> from what you pasted in the e-mail. One thing that may be happening is that
> you are taxing resources (like IO) if running MAKER multiple times or on
> too many CPUs. That can lead to failures because of truncated BLAST reports
> etc. In which case you can just retry and that will get around those types
> of IO derived errors. MAKER can generate a lot of IO, and if you are
> working on network mounted locations (i.e. the storage being used is
> actually across the network), then they can be lest robust than local
> storage (when under heavy load NFS can falsely report success on read/write
> operations that actually failed). It’s the reason we built in the retry
> capabilities of MAKER.
>
> For contigs that continuously fail, you may need to set clean_try=1. That
> will cause failures to start from scratch (i.e. delete all old reports on
> failure rather than just those suspected of being truncated).
>
> —Carson
>
>
>
> On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <qwzhang0601 at gmail.com> wrote:
>
> Dear Carson:
>
> About the error in my above email, I found the contig was correctly
> annotated at the second time RETRY. So please ignore my last email. But
> now, for a few number of scaffolds, I met problems to process the repeats
> (as shown below in red). I used both Mammalia repeat library and species
> specific repeat library (which is generated by your pipeline "
> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/
> Repeat_Library_Construction--Basic"). There were no such problems when I
> only used Mammalia repeat library. Do you have any ideas about this? What
> could be the reason? Or do you have any suggestions for me to find the
> reason? Many thanks
>
> Here are some parameters I used
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org=Mammalia #select a model organism for RepBase masking in
> RepeatMasker
> rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific
> repeat library in fasta format for Repe
>
> max_dna_len=300000
> split_hit=40000
> depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
> depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
> depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
> bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking
>
>
> Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm
> line 188.
> 33708 --> rank=NA, hostname=n409
> 33709 ERROR: Failed while processing all repeats
> 33710 ERROR: Chunk failed at level:3, tier_type:1
> 33711 FAILED CONTIG:Contig31
>
>
> Best
> Quanwei
>
> 2017-09-08 23:25 GMT-04:00 Quanwei Zhang <qwzhang0601 at gmail.com>:
>
>> Dear Carson:
>>
>> I got the following error again. Is this still related to memory issues?
>> I wonder whether there can be other reasons lead to this error? This time,
>> I got this error during training of the SNAP model. Before, even I set
>> max_dna_len=1Mb, I can train the model successfully.  And in the current
>> training (where I get the following error),  I have decreased the
>> max_dna_len to 300kb. I required the same amount memory as before. The only
>> difference is that I am using both mammalian repeat library and species
>> specific repeat library, while previously I only use the mammalian repeat
>> library. Will it greatly increases the requirement of memory to use both
>> repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I
>> have also set the depth_blast as 30 in current training.
>>
>> Thank you! Have a nice weekend!
>>
>>
>>
>> #---------------------------------------------------------------------
>> Now starting the contig!!
>> SeqID: Contig10
>> Length: 18773588
>> #---------------------------------------------------------------------
>>
>>
>> setting up GFF3 output and fasta chunks
>> doing repeat masking
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> doing blastx repeats
>> collecting blastx repeatmasking
>> processing all repeats
>> doing repeat masking
>> Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/File/NFSLock.pm
>> line 1050.
>> --> rank=NA, hostname=n224
>> ERROR: Failed while doing repeat masking
>> ERROR: Chunk failed at level:0, tier_type:1
>> FAILED CONTIG:Contig10
>>
>> ERROR: Chunk failed at level:2, tier_type:0
>> FAILED CONTIG:Contig10
>>
>> Best
>> Quanwei
>>
>> 2017-09-06 12:06 GMT-04:00 Carson Holt <carsonhh at gmail.com>:
>>
>>>
>>> (2) By reading some of your replies in the maker google group, and I
>>> noticed that it can reduce memory and save time for annotation if I set
>>> depth_blast to a certain number. So I changed the following parameters. But
>>> I wonder, whether it will decrease the quality of annotation? If it won't
>>> affect the quality, can I even use a smaller number (e.g., 20) to save more
>>> memory and time?
>>>
>>> depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
>>> depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
>>> depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
>>> bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking
>>>
>>>
>>> This values really only affects the final evidence kept in the GFF3 when
>>> you look at it in a browser. It has not affect on the annotation. This is
>>> because internally MAKER already collapses evidence down to the 10 best
>>> non-redundant features per evidence set per locus. The rest are put in the
>>> GFF3 just for reference. by setting it lower, you are just letting MAKER
>>> know it can through things away even sooner since you don’t want them in
>>> the GFF3. It provides a minor improvement for memory use, but
>>> max_dna_length is the big one that has the greatest effect.
>>>
>>>
>>> (3) I also have some concerns about the speed, especially for the long
>>> scaffolds (around 100Mb). I wonder which part is the most time consuming
>>> for genome annotation (repeat masking, blast, or polishing?).
>>> Particularly, I wonder whether the blastx of protein evidence will take
>>> majority of time. Now, I have prepared 99k mammalian Swiss protein
>>> sequences and 340k rodent TrEMBL protein sequences as protein evidences. I
>>> am considering whether I can save much time if I only use the 99k mammalian
>>> Swiss protein sequences as evidences.
>>>
>>>
>>> BLASTN (ESTs) -> fastest as it is searching nucleotide space
>>> BLASTX (proteins) -> must search 6 reading frames so will be at least 6
>>> times slower than BLASTN
>>> TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least
>>> 12 times slower than BLASTN and twice as slow as BLASTX
>>>
>>> Also double the dataset size, double the runtime. Larger window sizes
>>> via max_dna_length will also increase runtimes.
>>>
>>>
>>> (4) For some reasons, I can not run maker though MPI on our cluster. So
>>> I can only start multiple maker. I wonder if it is possible to let multiple
>>> maker to annotate the same long scaffold (i.e., for a single sequence I
>>> start multiple maker, without splitting the long sequence into shorter
>>> ones).
>>>
>>>
>>> Without MPI you won’t be able to split up large contigs. At the very
>>> least you can try and run on a single node and set MPI to use all CPUs on
>>> that node. It’s less difficult to set up compared to cross node jobs via
>>> MPI.
>>>
>>>
>>> (5) Still about the speed issue. I read some of your comments about
>>> "cpus" parameters in the maker_opts file (http://gmod.827538.n3.nabble.
>>> com/open3-fork-failed-Cannot-allocate-memory-td4025117.html). And I
>>> know it indicate the number of cpus for a single chunk. So if I set
>>> "cpus=2" in the maker_opts file, then I can use the following command to
>>> submit the job, right?
>>>
>>>
>>> The cpu parameter only affects how many CPUs are given to the blast
>>> command line. So only the BLASt step will speed up, so I recommend using
>>> MPI to get all steps to speed up. Even if you are only running on a single
>>> node, you can give all CPUs to the mpiexec command.
>>>
>>>
>>> —Carson
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170911/f02b6a0b/attachment-0003.html>