[maker-devel] Private message regarding: MAKER run error
Carson Holt
carsonhh at gmail.com
Tue Apr 17 10:12:32 MDT 2018
The datasets do not look too large. The failure you are seeing is happening outside of MAKER. So there is something wrong on the system itself. You will probably have to reinstall perl against your local libraries especially if you reinstalled BerkleyDB. Or try downloading the latest stable release of Perl (it comes precompiled against static libraries - Berkeley DB version 1.x which can help avoid some issues). You will have to reinstall MAKER to use that version of Perl (MAKER uses the perl version used to call Build.PL during the install).
If you are running on something like FreeBSD, it may just break Perl’s DB_File.
Also this note from CPAN —>
Although DB_File is intended to be used with Berkeley DB version 1, it can also be used with version 2, 3 or 4. In this case the interface is limited to the functionality provided by Berkeley DB 1.x.
If reinstalling tools does not work around your issue, you may just have to run on a different system.
—Carson
> On Apr 15, 2018, at 8:34 AM, ohon Kin <ohon.kin at gmail.com> wrote:
>
>
>
> grep -c ">" Ca_kacst.fna
> 32572
>
>
> the EST i have are assembled to contigs
> grep -c ">" Ca_EST
> 23602
>
>
> grep -c ">" Ca__protein.faa
> 26729
>
> these are my input-data i have reinstall perl as your instructions please have a look, the tool still 1T not enough will stop while running of the run
>
> i get this Error
> ad$ ./maker
> STATUS: Parsing control files...
> WARNING: 'max_dna_len' is set too low. The minimum value permited is 50,000.
> max_dna_len will be reset to 50,000
>
> STATUS: Processing and indexing input FASTA files...
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> HASH: Out of overflow pages. Increase page size
> Filesize limit exceeded: 25
>
>
>
> my maker_opt
>
>
> #-----Genome (these are always required)
> genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna #genome sequence (fasta file or fasta embeded in GFF3 file)
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff= #MAKER derived GFF3 file
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or assembled mRNA-seq in fasta format
> altest= #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa #protein sequence file in fasta format (i.e. from mutiple oransisms)
> protein_gff= #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org=all #select a model organism for RepBase masking in RepeatMasker
> rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
>
> #-----Gene Prediction
> snaphmm= #SNAP HMM file
> gmhmm= #GeneMark HMM file
> augustus_species= #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=10000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
> pred_stats=0 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
>
> tries=2 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP= #specify a directory other than the system default temporary directory for temporary files
>
>
> On 11 April 2018 at 20:57, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
> The issue is with Berkley DB. BioPerl is using perl’s DB_File module to index the fastas.
>
> 1. Make sure you do not have an extremely large number of reads in the fasta files (i.e. mRNA-seq data which cannot be used directly as input to MAKER, you must assemble it first into transcriptome contigs)
> 2. Reinstall perl and compile against the newly installed BerkleyDB libraries.
> 3. Remove the brew installed BerkleyDB and use perl’s precompiled DB_File module.
>
> You can count reads in your fasta input using this command (replace file.fasta)
>
> grep -c “>” file.fasta
>
> If your counts are really high (i.e. higher than a few hundred thousand maximum), then you have a data issue. You are either giving too much data or the wrong data as input.
>
> —Carson
>
>
>
>> On Apr 11, 2018, at 11:39 AM, ohon Kin <ohon.kin at gmail.com <mailto:ohon.kin at gmail.com>> wrote:
>>
>>
>> hello ; Carson
>>
>> i really would appreciate your help im kind of having same issue
>> i get this Error when i run maker i assumed that it required big memory space
>>
>> STATUS: Processing and indexing input FASTA files...
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> HASH: Out of overflow pages. Increase page size
>> Filesize limit exceeded: 25
>>
>> while working 1T of my Hard-disc capacity seems not enough for maker annotation
>> i think something wrong in my input data or the dependencies
>> would you please advice on the matter and elaborate solutions please
>>
>> i have install BerkleyDB using brew
>>
>> The input giving to Maker as followed :
>> Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> then added it directly to maker for annotation
>>
>> do i have to apply these data pre-process before it applied to maker
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote:
>> The FASTA file gets indexed by BioPerl using BerkleyDB.
>>
>> I’m guessing there is something odd about your input file and the database has run out of HASHes for indexing.
>>
>> You can google if there is a setting you can configure in BerkleyDB on Mac.
>>
>> But I suspect you are doing something like giving the raw reads from an mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of entrires to be indexed), which would be incorrect. MAKER can’t handle raw data. You must first assemble it using using like Trinity for example for mRNA.
>>
>> Thanks,
>> Carson
>>
>>> On Dec 7, 2017, at 8:53 AM, Scott Cain <sc...@ <>scottcain.net <http://scottcain.net/>> wrote:
>>>
>>> Hi Guinara,
>>>
>>> I don't know (though my guess would be that you're running out of memory). I'm cc'ing the MAKER developer's mailing list to see if anybody on that list knows.
>>>
>>> Scott
>>>
>>>
>>> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova <tagi...@ <>ualberta.ca <http://ualberta.ca/>> wrote:
>>> Hello,
>>>
>>> I got this error running maker on mac:
>>>
>>> STATUS: Parsing control files...
>>> STATUS: Processing and indexing input FASTA files...
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> Filesize limit exceeded: 25
>>>
>>> Is there anything that could solve it?
>>>
>>> Thank you,
>>> Gulnara
>>>
>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/ <http://gmod.org/>) 216-392-3087
>>> Ontario Institute for Cancer Research
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker...@ <>box290.bluehost. <http://box290.bluehost.com/>com <http://box290.bluehost.com/>
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>
>
>
> --
> Warning: This message and its attachment, if any, are confidential and may contain information protected by law. If you are not the intended recipient, please contact the sender immediately and delete the message and its attachment, if any. You should not copy the message and its attachment, if any, or disclose its contents to any other person or use it for any purpose. Statements and opinions expressed in this e-mail and its attachment, if any, are those of the sender, and do not necessarily reflect those of kacst. accepts no liability for any damage caused by this email.
> تحذير: هذه الرسالة وما تحويه من مرفقات (إن وجدت) تمثل وثيقة سرية قد تحتوي على معلومات محمية بموجب القانون. إذا لم تكن الشخص المعني بهذه الرسالة فيجب عليك تنبيه المُرسل بخطأ وصولها إليك، وحذف الرسالة ومرفقاتها (إن وجدت)، ولا يجوز لك نسخ أو توزيع هذه الرسالة أو مرفقاتها (إن وجدت) أو أي جزء منها، أو البوح بمحتوياتها للغير أو استعمالها لأي غرض. علماً بأن فحوى هذه الرسالة ومرفقاتها (ان وجدت) تعبر عن رأي المُرسل وليس بالضرورة رأي مدينة الملك عبدالعزيز، ولا تتحمل المدينة أي مسئولية عن الأضرار الناتجة عن ما قد يحتويه هذا البريد.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180417/31da9b41/attachment-0003.html>
More information about the maker-devel
mailing list