<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I think the cause of the error may have been a little further upstream from what you pasted in the e-mail. One thing that may be happening is that you are taxing resources (like IO) if running MAKER multiple times or on too many CPUs. That can lead to failures because of truncated BLAST reports etc. In which case you can just retry and that will get around those types of IO derived errors. MAKER can generate a lot of IO, and if you are working on network mounted locations (i.e. the storage being used is actually across the network), then they can be lest robust than local storage (when under heavy load NFS can falsely report success on read/write operations that actually failed). It’s the reason we built in the retry capabilities of MAKER.<div class=""><br class=""></div><div class="">For contigs that continuously fail, you may need to set clean_try=1. That will cause failures to start from scratch (i.e. delete all old reports on failure rather than just those suspected of being truncated).<br class=""><div class=""><br class=""></div><div class="">—Carson<br class=""><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Sep 11, 2017, at 10:19 AM, Quanwei Zhang <<a href="mailto:qwzhang0601@gmail.com" class="">qwzhang0601@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class="">Dear Carson:<br class=""><br class="">About the error in my above email, I found the contig was correctly annotated at the second time RETRY. So please ignore my last email. But now, for a few number of scaffolds, I met problems to process the repeats (as shown below in red). I used both Mammalia repeat library and species specific repeat library (which is generated by your pipeline "<a href="http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic" class="">http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic</a>"). There were no such problems when I only used Mammalia repeat library. Do you have any ideas about this? What could be the reason? Or do you have any suggestions for me to find the reason? Many thanks <br class=""></div><div class=""><br class=""></div><div class="">Here are some parameters I used</div><div class=""><br class=""></div><div class="">#-----Repeat Masking (leave values blank to skip repeat masking)<br class="">model_org=Mammalia #select a model organism for RepBase masking in RepeatMasker<br class="">rmlib=../consensi.fa.classifiednoProtFinal #provide an organism specific repeat library in fasta format for Repe<br class=""><br class=""></div><div class="">max_dna_len=300000</div><div class="">split_hit=40000<br class=""></div><div class="">depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)<br class="">depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)<br class="">depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)<br class="">bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking</div><div class=""><br class=""></div><div class=""><br class=""><span style="color:rgb(255,0,0)" class="">Died at /gs/gsfs0/hpc01/apps/MAKER/2.31.9/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm line 188.<br class="">33708 --> rank=NA, hostname=n409<br class="">33709 ERROR: Failed while processing all repeats<br class="">33710 ERROR: Chunk failed at level:3, tier_type:1<br class="">33711 FAILED CONTIG:Contig31</span><br class=""><br class=""><br class=""></div>Best<br class=""></div>Quanwei<br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">2017-09-08 23:25 GMT-04:00 Quanwei Zhang <span dir="ltr" class=""><<a href="mailto:qwzhang0601@gmail.com" target="_blank" class="">qwzhang0601@gmail.com</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="">Dear Carson:</div><div class=""><br class=""></div><div class="">I got the following error again. Is this still related to memory issues? I wonder whether there can be other reasons lead to this error? This time, I got this error during training of the SNAP model. Before, even I set max_dna_len=1Mb, I can train the model successfully. And in the current training (where I get the following error), I have decreased the max_dna_len to 300kb. I required the same amount memory as before. The only difference is that I am using both mammalian repeat library and species specific repeat library, while previously I only use the mammalian repeat library. Will it greatly increases the requirement of memory to use both repeat libraries (even when I decrease max_dna_len from 1Mb to 300kb)? I have also set the <span class="m_-3040512414793419403gmail-im">depth_blast as 30 in current training.</span></div><div class=""><span class="m_-3040512414793419403gmail-im"><br class=""></span></div><div class=""><span class="m_-3040512414793419403gmail-im">Thank you! Have a nice weekend!</span> </div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class="">#-----------------------------<wbr class="">------------------------------<wbr class="">----------<br class="">Now starting the contig!!<br class="">SeqID: Contig10<br class="">Length: 18773588<br class="">#-----------------------------<wbr class="">------------------------------<wbr class="">----------<span class=""><br class=""><br class=""><br class="">setting up GFF3 output and fasta chunks<br class="">doing repeat masking<br class=""></span>doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">doing blastx repeats<br class="">collecting blastx repeatmasking<br class="">processing all repeats<span class=""><br class="">doing repeat masking<br class=""><span style="color:rgb(255,0,0)" class="">Can't kill a non-numeric process ID at /gs/gsfs0/hpc01/apps/MAKER/2.<wbr class="">31.9/bin/../lib/File/NFSLock.<wbr class="">pm line 1050.</span><br class=""></span>--> rank=NA, hostname=n224<span class=""><br class="">ERROR: Failed while doing repeat masking<br class="">ERROR: Chunk failed at level:0, tier_type:1<br class=""></span>FAILED CONTIG:Contig10<br class=""><br class="">ERROR: Chunk failed at level:2, tier_type:0<br class="">FAILED CONTIG:Contig10<br class=""><br class=""></div>Best<span class="HOEnZb"><font color="#888888" class=""><br class=""></font></span></div><span class="HOEnZb"><font color="#888888" class="">Quanwei<br class=""></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br class=""><div class="gmail_quote">2017-09-06 12:06 GMT-04:00 Carson Holt <span dir="ltr" class=""><<a href="mailto:carsonhh@gmail.com" target="_blank" class="">carsonhh@gmail.com</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><span class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class="">(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?<br class=""><div class=""><br class="">depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)<br class="">depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)<br class="">depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)<br class="">bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking</div></div></div></blockquote><div class=""><br class=""></div></span><div class="">This values really only affects the final evidence kept in the GFF3 when you look at it in a browser. It has not affect on the annotation. This is because internally MAKER already collapses evidence down to the 10 best non-redundant features per evidence set per locus. The rest are put in the GFF3 just for reference. by setting it lower, you are just letting MAKER know it can through things away even sooner since you don’t want them in the GFF3. It provides a minor improvement for memory use, but max_dna_length is the big one that has the greatest effect.</div><span class=""><div class=""><br class=""></div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?). Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.</div></div></div></blockquote><div class=""><br class=""></div></span><div class="">BLASTN (ESTs) -> fastest as it is searching nucleotide space</div><div class="">BLASTX (proteins) -> must search 6 reading frames so will be at least 6 times slower than BLASTN</div><div class="">TBLASTX (alt-ESTs) -> must search 12 reading frames so will be at least 12 times slower than BLASTN and twice as slow as BLASTX</div><div class=""><br class=""></div><div class="">Also double the dataset size, double the runtime. Larger window sizes via max_dna_length will also increase runtimes.</div><span class=""><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">(4) For some reasons, I can not run maker though MPI on our cluster. So I
can only start multiple maker. I wonder if it is possible to let
multiple maker to annotate the same long scaffold (i.e., for a single
sequence I start multiple maker, without splitting the long sequence
into shorter ones).</div></div></div></blockquote><div class=""><br class=""></div></span><div class="">Without MPI you won’t be able to split up large contigs. At the very least you can try and run on a single node and set MPI to use all CPUs on that node. It’s less difficult to set up compared to cross node jobs via MPI.</div><span class=""><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">(5) Still about the speed issue. I read some of your comments about "cpus" parameters in the maker_opts file (<a href="http://gmod.827538.n3.nabble.com/open3-fork-failed-Cannot-allocate-memory-td4025117.html" target="_blank" class="">http://gmod.827538.n3.nabble.<wbr class="">com/open3-fork-failed-Cannot-a<wbr class="">llocate-memory-td4025117.html</a>)<wbr class="">. And I know it indicate the number of cpus for a single chunk. So if I set "cpus=2" in the maker_opts file, then I can use the following command to submit the job, right? </div></div></div></blockquote><div class=""><br class=""></div></span><div class="">The cpu parameter only affects how many CPUs are given to the blast command line. So only the BLASt step will speed up, so I recommend using MPI to get all steps to speed up. Even if you are only running on a single node, you can give all CPUs to the mpiexec command.</div><span class="m_-3040512414793419403HOEnZb"><font color="#888888" class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class="">—Carson</div></font></span></div></div></blockquote></div><br class=""></div>
</div></div></blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></div></div></body></html>