[maker-devel] Private message regarding: MAKER run error

Sun Apr 15 08:34:52 MDT 2018

grep -c ">" Ca_kacst.fna

32572

the EST i have are assembled to contigs

grep -c ">" Ca_EST

23602

grep -c ">" Ca__protein.faa

26729

these are my input-data i have reinstall perl as your instructions please
have a look, the tool still 1T not enough will stop while running of the run

i get this Error

ad$ ./maker

STATUS: Parsing control files...

WARNING: 'max_dna_len' is set too low.  The minimum value permited is
50,000.

max_dna_len will be reset to 50,000

STATUS: Processing and indexing input FASTA files...

HASH: Out of overflow pages.  Increase page size

Filesize limit exceeded: 25

*my maker_opt*

#-----Genome (these are always required)
genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna
 #genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff= #MAKER derived GFF3 file
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or
assembled mRNA-seq in fasta format
altest= #EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at
least one)
protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa
  #protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for
RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for
RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species= #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation
pass-through)
est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 =
yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3
file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST
databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI,
leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=10000 #length for dividing up contigs into chunks
(increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often
useless)

pred_flank=200 #flank for extending evidence clusters sent to gene
predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as
models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 =
no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction
(bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size
for evidence alignments)
single_exon=0 #consider single exon EST evidence when generating
annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon
is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some
reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0
= no
clean_up=0 #removes theVoid directory with individual analysis files, 1 =
yes, 0 = no
TMP= #specify a directory other than the system default temporary directory
for temporary files

On 11 April 2018 at 20:57, Carson Holt <carsonhh at gmail.com> wrote:

> The issue is with Berkley DB. BioPerl is using perl’s DB_File module to
> index the fastas.
>
> 1. Make sure you do not have an extremely large number of reads in the
> fasta files (i.e. mRNA-seq data which cannot be used directly as input to
> MAKER, you must assemble it first into transcriptome contigs)
> 2. Reinstall perl and compile against the newly installed BerkleyDB
> libraries.
> 3. Remove the brew installed BerkleyDB and use perl’s precompiled DB_File
> module.
>
> You can count reads in your fasta input using this command (replace
> file.fasta)
>
> grep -c “>” file.fasta
>
> If your counts are really high (i.e. higher than a few hundred thousand
> maximum), then you have a data issue. You are either giving too much data
> or the wrong data as input.
>
> —Carson
>
>
>
> On Apr 11, 2018, at 11:39 AM, ohon Kin <ohon.kin at gmail.com> wrote:
>
>
> hello ; Carson
>
> i really would appreciate your help im kind of having same issue
> i get this Error when i run maker i assumed that it required big memory
> space
>
> STATUS: Processing and indexing input FASTA files...
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> HASH: Out of overflow pages.  Increase page size
> Filesize limit exceeded: 25
>
> while working 1T of my Hard-disc capacity seems not enough for maker
> annotation
> i think something wrong in my input data or the dependencies
>  would you please advice on the matter and elaborate solutions please
>
> i have install BerkleyDB using brew
>
> The input giving to Maker as followed :
> Genome , EST , Protein. all in Fasta format, downloaded from NCBI --->
> then added it directly to maker for annotation
>
>  do i have to apply these data pre-process before it applied to maker
>
>
>
>
>
>
>
>
> On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote:
>>
>> The FASTA file gets indexed by BioPerl using BerkleyDB.
>>
>
>
>> I’m guessing there is something odd about your input file and the
>> database has run out of HASHes for indexing.
>>
>
>
>> You can google if there is a setting you can configure in BerkleyDB on
>> Mac.
>>
>
>
>> But I suspect you are doing something like giving the raw reads from an
>> mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of
>> entrires to be indexed), which would be incorrect. MAKER can’t handle raw
>> data. You must first assemble it using using like Trinity for example for
>> mRNA.
>>
>> Thanks,
>> Carson
>>
>> On Dec 7, 2017, at 8:53 AM, Scott Cain <sc... at scottcain.net> wrote:
>>
>> Hi Guinara,
>>
>> I don't know (though my guess would be that you're running out of
>> memory).  I'm cc'ing the MAKER developer's mailing list to see if anybody
>> on that list knows.
>>
>> Scott
>>
>>
>> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova <tagi... at ualbert
>> a.ca> wrote:
>>
>>> Hello,
>>>
>>> I got this error running maker on mac:
>>>
>>> STATUS: Parsing control files...
>>> STATUS: Processing and indexing input FASTA files...
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> HASH: Out of overflow pages. Increase page size
>>> Filesize limit exceeded: 25
>>>
>>> Is there anything that could solve it?
>>>
>>> Thank you,
>>> Gulnara
>>>
>>>
>>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> _______________________________________________
>> maker-devel mailing list
>> maker... at box290.bluehost. <http://box290.bluehost.com/>com
>> <http://box290.bluehost.com/>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>

-- 

*Warning: *This message and its attachment, if any, are confidential and
may contain information protected by law. If you are not the intended
recipient, please contact the sender immediately and delete the message and
its attachment, if any. You should not copy the message and its attachment,
if any, or disclose its contents to any other person or use it for any
purpose. Statements and opinions expressed in this e-mail and its
attachment, if any, are those of the sender, and do not necessarily reflect
those of kacst. accepts no liability for any damage caused by this email.

*تحذير:* هذه الرسالة وما تحويه من مرفقات (إن وجدت) تمثل وثيقة سرية قد تحتوي
على معلومات محمية بموجب القانون. إذا لم تكن الشخص المعني بهذه الرسالة فيجب
عليك تنبيه المُرسل بخطأ وصولها إليك، وحذف الرسالة ومرفقاتها (إن وجدت)، ولا
يجوز لك نسخ أو توزيع هذه الرسالة أو مرفقاتها (إن وجدت) أو أي جزء منها، أو
البوح بمحتوياتها للغير أو استعمالها لأي غرض. علماً بأن فحوى هذه الرسالة
ومرفقاتها (ان وجدت) تعبر عن رأي المُرسل وليس بالضرورة رأي مدينة الملك
عبدالعزيز، ولا تتحمل المدينة أي مسئولية عن الأضرار الناتجة عن ما قد يحتويه
هذا البريد.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20180415/2aa20153/attachment-0001.html>