From qlian003 at ucr.edu Tue Apr 3 13:49:28 2018 From: qlian003 at ucr.edu (Qihua Liang) Date: Tue, 3 Apr 2018 11:49:28 -0700 Subject: [maker-devel] exon names in gff file Message-ID: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> Dear Maker development team, I find in the gff file of exon annotation, it looks like: ctg631 maker exon 16239 16243 . - . ID=pInfestans_00016306-RA:exon:96;Parent=pInfestans_00016306-RA; I am wondering what does ?96? mean in ID=pInfestans_00016306-RA:exon:96, it does not look like the exon numbering because not all transcripts have an exon starts from 0. Thank you Qihua From carsonhh at gmail.com Tue Apr 3 14:24:56 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 3 Apr 2018 13:24:56 -0600 Subject: [maker-devel] exon names in gff file In-Reply-To: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> References: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> Message-ID: <9A0C4A3C-DE99-4599-B867-A4A7854EA5AB@gmail.com> It?s just an iterator to ensure the ID attribute is unique for proper parent/child feature reconstruction. It?s required on the computation side and is meaningless biologically. ?Carson > On Apr 3, 2018, at 12:49 PM, Qihua Liang wrote: > > Dear Maker development team, > > I find in the gff file of exon annotation, it looks like: > ctg631 maker exon 16239 16243 . - . ID=pInfestans_00016306-RA:exon:96;Parent=pInfestans_00016306-RA; > > I am wondering what does ?96? mean in ID=pInfestans_00016306-RA:exon:96, it does not look like the exon numbering because not all transcripts have an exon starts from 0. > > Thank you > Qihua > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Apr 6 10:40:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Apr 2018 09:40:14 -0600 Subject: [maker-devel] SNAP bootstrap training Message-ID: More than 2 total trading rounds can generate what is known as the overtraining trap. So I rarely do more than one round of bootstrapping with SNAP. To evaluate the models, look at them in a browser. If the raw models are similar to the final hint based models, then SNAP is well trained. If not then SNAP is poorly trained. Don?t use final models directly to evaluate training. Rather look at the raw models. They are what are made direct from the HMM. A well trained predictor will perform similarly even outside if MAKER. If it?s over predicting on its own, you may need to filter or even manually curate a subset of models from the initial training round to get better bootstrap training. Also if you did not build a species specific repeat library, you may be under masking and essentially training SNAP to find transposons with the bootstrapping. ?Carson Sent from my iPhone > On Apr 6, 2018, at 7:23 AM, Timo Metz wrote: > > Hello, > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > thanks in advance > > kind regards > Timo Metz From timo.metz at googlemail.com Fri Apr 6 08:23:29 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 6 Apr 2018 15:23:29 +0200 Subject: [maker-devel] SNAP bootstrap training Message-ID: Hello, I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? thanks in advance kind regards Timo Metz -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngrundma at uni-muenster.de Sat Apr 7 05:38:46 2018 From: ngrundma at uni-muenster.de (Norbert Grundmann) Date: Sat, 7 Apr 2018 12:38:46 +0200 Subject: [maker-devel] problems running maker 2.31.9 Message-ID: Hello, I succesfully installed maker version 2.31.9 on my FreeBSD 10.3 Server.? So far only minor things had to be done.? But - what does following mean? # maker *Thread server rejected connection: 192.168.1.3:29786 does not match allowed IP mask* The thing is that the maker process is running in a "container" (jail) with the mentioned ip adress - which is "natted" to the outside.? is there any chance to run it? Thank you, Norbert Grundmann -- Norbert Grundmann Inst. of Bioinformatics Muenster Niels Stensen Strasse 14 48149 Muenster / Germany Tel. 0251 - 83 53 007 (Use *BSD, because Linux is a patch for Linux) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Tue Apr 10 10:15:51 2018 From: dandence at gmail.com (Daniel Ence) Date: Tue, 10 Apr 2018 11:15:51 -0400 Subject: [maker-devel] SNAP bootstrap training In-Reply-To: References: Message-ID: Hi, what evidence are you using to get AEDs for the results of your bootstrap training? I don?t find it surprising that the AEDs get worse in subsequent rounds of bootstrap training since overtraining is a real possibility when training ab initio predictors. 300 genes also might not be enough genes, since I think the tutorials and protocols here and here use 1000 genes for training SNAP. I do find it surprising that training file from a different organism gives models that match evidence from your organism of interest. Is that correct? ~Daniel > On Apr 6, 2018, at 9:23 AM, Timo Metz wrote: > > Hello, > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > thanks in advance > > kind regards > Timo Metz > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 10 10:23:02 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2018 09:23:02 -0600 Subject: [maker-devel] SNAP bootstrap training In-Reply-To: References: Message-ID: If there is something in the assembly (broken ORF, altered splice site, or small string of N?s - very common in new assemblies) the gene predictor will alter splicing and intron/exon patterns to get around it. The issue is almost always in the assembly. Also if you are not masking repeats (i.e. did not build a species specific library), it will introduce ORFs from transposons that will confuse gene predictors. Finally some predictors don?t work well on some organisms. SNAP has trouble with many vertebrate species for example. A higher quality dataset of ~300 is good enough for training. If you have more (500-1000), most protocols have you split the dataset into a training set and a test set to evaluate sensitivity/specificity using tools like Eval from WashU (i.e. you train on half then predict on the other half to see if the predictions match the models). ?Carson > On Apr 9, 2018, at 5:55 AM, Timo Metz wrote: > > Hey Carson, > > thanks for your advice. Would you then rather go for a little set of genes with high quality or rather more genes to feed into MAKER for the training? > > And I have another question, which is rather not directly related to this topic but I hope that you might still answer: It seems sometimes as if the hint-based prediction does not work sufficient. I can clearly find examples where maker infers gene models directly from a prediction even though the evidence does totally indicate something different and the gene model is probably wrong then (as I even find that cases when looking at highly conserved regions where I actually now the structure the gene should have). > > best > Timo > > > 2018-04-06 17:40 GMT+02:00 Carson Holt >: > More than 2 total trading rounds can generate what is known as the overtraining trap. So I rarely do more than one round of bootstrapping with SNAP. To evaluate the models, look at them in a browser. If the raw models are similar to the final hint based models, then SNAP is well trained. If not then SNAP is poorly trained. Don?t use final models directly to evaluate training. Rather look at the raw models. They are what are made direct from the HMM. A well trained predictor will perform similarly even outside if MAKER. If it?s over predicting on its own, you may need to filter or even manually curate a subset of models from the initial training round to get better bootstrap training. Also if you did not build a species specific repeat library, you may be under masking and essentially training SNAP to find transposons with the bootstrapping. > > ?Carson > > Sent from my iPhone > > > On Apr 6, 2018, at 7:23 AM, Timo Metz > wrote: > > > > Hello, > > > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > > > thanks in advance > > > > kind regards > > Timo Metz > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 11:38:59 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2018 10:38:59 -0600 Subject: [maker-devel] problems running maker 2.31.9 In-Reply-To: References: Message-ID: If you are running with MPI, you may need to test different MPI configurations and settings. For example if it is running on a single machine (not cross machine MPI) you can manually specify the host as localhost. ?Carson > On Apr 7, 2018, at 4:38 AM, Norbert Grundmann wrote: > > Hello, > > I succesfully installed maker version 2.31.9 on my FreeBSD 10.3 Server. So far only minor things had to be done. But - what does following mean? > # maker > Thread server rejected connection: 192.168.1.3:29786 does not match allowed IP mask > The thing is that the maker process is running in a "container" (jail) with the mentioned ip adress - which is "natted" to the outside. is there any chance to run it? > > Thank you, Norbert Grundmann > > -- > Norbert Grundmann > Inst. of Bioinformatics Muenster > Niels Stensen Strasse 14 > 48149 Muenster / Germany > Tel. 0251 - 83 53 007 > (Use *BSD, because Linux is a patch for Linux) > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Apr 11 11:36:29 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Wed, 11 Apr 2018 16:36:29 +0000 Subject: [maker-devel] MAKER start with masked genome In-Reply-To: <7EEA2C96-C177-4B43-979D-F105DCDAA1CA@umail.utah.edu> References: <900B3D58-5EF1-4410-B666-B68E479F8BB8@uni-muenster.de> <7EEA2C96-C177-4B43-979D-F105DCDAA1CA@umail.utah.edu> Message-ID: <597DA35A-0D01-45CD-A992-FD7B95D85B54@genetics.utah.edu> There are two ways. First rerun in the same working directory. MAKER will reuse previous repeat masking files as long as repeat masking settings did not change between runs. Second, if you have a genome wide GFF3 from the previous run, you can pass it in as maker_gff and set the appropriate pass=1 option underneath for repeats. ?Carson Sent from my iPhone > On Apr 11, 2018, at 6:22 AM, Mark Yandell wrote: > > > > On 4/11/18, 5:08 AM, "Jonas Bohn" wrote: > > Dear MAKER developers, > > I`m a master student in a bioinformatics group of university of muenster and I want to use MAKER for genome Annotation of an Ant genome. I ran RepeatMasker before and it took some days to get a masked genome. So I try to save some time for my master thesis. My question is: Is there an option to run MAKER2 without running RepeatMasker again (skip the RepeatMasker step)? > > I`m looking forward to hearing from you. > > Best regards, > > Jonas Bohn > > MSc. Student > Evolutionary Bioinformatics > University of Muenster, Germany > From carsonhh at gmail.com Wed Apr 11 12:57:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Apr 2018 11:57:36 -0600 Subject: [maker-devel] Private message regarding: MAKER run error In-Reply-To: References:

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> Message-ID: <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> The issue is with Berkley DB. BioPerl is using perl?s DB_File module to index the fastas. 1. Make sure you do not have an extremely large number of reads in the fasta files (i.e. mRNA-seq data which cannot be used directly as input to MAKER, you must assemble it first into transcriptome contigs) 2. Reinstall perl and compile against the newly installed BerkleyDB libraries. 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File module. You can count reads in your fasta input using this command (replace file.fasta) grep -c ?>? file.fasta If your counts are really high (i.e. higher than a few hundred thousand maximum), then you have a data issue. You are either giving too much data or the wrong data as input. ?Carson > On Apr 11, 2018, at 11:39 AM, ohon Kin wrote: > > > hello ; Carson > > i really would appreciate your help im kind of having same issue > i get this Error when i run maker i assumed that it required big memory space > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > while working 1T of my Hard-disc capacity seems not enough for maker annotation > i think something wrong in my input data or the dependencies > would you please advice on the matter and elaborate solutions please > > i have install BerkleyDB using brew > > The input giving to Maker as followed : > Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> then added it directly to maker for annotation > > do i have to apply these data pre-process before it applied to maker > > > > > > > > > On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: > The FASTA file gets indexed by BioPerl using BerkleyDB. > > I?m guessing there is something odd about your input file and the database has run out of HASHes for indexing. > > You can google if there is a setting you can configure in BerkleyDB on Mac. > > But I suspect you are doing something like giving the raw reads from an mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of entrires to be indexed), which would be incorrect. MAKER can?t handle raw data. You must first assemble it using using like Trinity for example for mRNA. > > Thanks, > Carson > >> On Dec 7, 2017, at 8:53 AM, Scott Cain scottcain.net > wrote: >> >> Hi Guinara, >> >> I don't know (though my guess would be that you're running out of memory). I'm cc'ing the MAKER developer's mailing list to see if anybody on that list knows. >> >> Scott >> >> >> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova ualberta.ca > wrote: >> Hello, >> >> I got this error running maker on mac: >> >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> Filesize limit exceeded: 25 >> >> Is there anything that could solve it? >> >> Thank you, >> Gulnara >> >> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/ ) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> maker-devel mailing list >> maker...@ <>box290.bluehost. com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nellerk at yorku.ca Thu Apr 12 12:12:12 2018 From: nellerk at yorku.ca (nellerk at yorku.ca) Date: Thu, 12 Apr 2018 17:12:12 +0000 Subject: [maker-devel] evidence-only gene annotation Message-ID: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> Hello, I am using Maker to annotate a novel, non-model plant genome. Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. I have two questions:1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way.2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library. Thank you for any assistance! Kira -------------- next part -------------- An HTML attachment was scrubbed... URL: From aejysselansie at gmail.com Fri Apr 13 01:35:08 2018 From: aejysselansie at gmail.com (Ansie Yssel) Date: Fri, 13 Apr 2018 08:35:08 +0200 Subject: [maker-devel] Maker, no fasta files in output In-Reply-To: References: Message-ID: Dear Carson I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? I am trying to annotate a newly sequenced genome. I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. The input that I used for generating the gene models (before training SNAP) was: the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. softmasking was set to 1 That output was used to train snap. Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. I trained snap for a second time. That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. Input for Basic Protocol 1 was: The snap hmm file my unmasked genome the species specific repeat library RNAseq evidence the protein evidence from a close relative softmasking was set to 1 est2genome=0 protein2genome=0 I collected the results as outlined on page 5 of the article. However I noticed that there were no Fasta files. Do you have any idea what could have gone wrong? Can I send my log files to you? Thanks in advance for any assistance. Kind Regards Anna Yssel On 12 April 2018 at 10:16, Ansie Yssel wrote: > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new > topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated > with REPET, the unmasked genome, and proteins from a "closley" related > species (actually not that close, but my species is the only one in its > genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article > "Genome annotation and Curation using MAKER and MAKER-P" published in Curr > Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training > SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and > protein2genome=1. I also used Repeat masking and included my species > specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using > the hmm file as input (and setting est2genome=0 and protein2genome=0). > Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the > aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > > > Virus-free. > www.avast.com > > <#m_4251738625593252965_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- Kind Regards A Yssel Centre of Microbial and Plant Genetics KU Leuven Faculteit Bio-ingenieurswetenschappen Kasteelpark Arenberg 20, bus 2460 B-3001 Heverlee Belgium -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 16 13:21:22 2018 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Apr 2018 12:21:22 -0600 Subject: [maker-devel] Maker, no fasta files in output In-Reply-To: References: Message-ID: <89921910-7CB1-4481-8973-F3E7DF4E688D@gmail.com> Hi Anna, The lack of results means you either had no results from SNAP or no evidence supporting results in your run. You can check for SNAP results just by looking for snap_masked features in the GFF3. For evidence, make sure you still provided the protein= and est= files even though you tunred off est2genome/protein2genome. ?Carson > On Apr 13, 2018, at 12:35 AM, Ansie Yssel wrote: > > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > On 12 April 2018 at 10:16, Ansie Yssel > wrote: > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > > Virus-free. www.avast.com

> > > -- > Kind Regards > A Yssel > > Centre of Microbial and Plant Genetics > KU Leuven > Faculteit Bio-ingenieurswetenschappen > Kasteelpark Arenberg 20, bus 2460 > B-3001 Heverlee > Belgium > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 16 13:26:06 2018 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Apr 2018 12:26:06 -0600 Subject: [maker-devel] evidence-only gene annotation In-Reply-To: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> References: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> Message-ID: Fusions are generated by the evidence alignments. Either transcript assemblies wee falsely fused or proteins are bridging neighboring paralogs. For transcript data you can try building the assembly with Trinity and the jaccard_index option which will reduce the occurrence of transcript assembly fusion. Also set correct_est_fusion=1 in the options files. For protein evidence driven fusions, you can try DeFusion which is a post process you run on the MAKER output that will search and attempt top correct for paralog driven fusions. ?Carson > On Apr 12, 2018, at 11:12 AM, nellerk at yorku.ca wrote: > > Hello, > > I am using Maker to annotate a novel, non-model plant genome. > > Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. > > I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). > > So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. > > I have two questions: > 1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way. > 2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library. > > Thank you for any assistance! > > Kira > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 10:58:16 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2018 09:58:16 -0600 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> <381F5EAB-2C0B-4DED-BD32-573D1B1C2B47@bils.se>

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> Message-ID: grep -c ">" Ca_kacst.fna 32572 the EST i have are assembled to contigs grep -c ">" Ca_EST 23602 grep -c ">" Ca__protein.faa 26729 these are my input-data i have reinstall perl as your instructions please have a look, the tool still 1T not enough will stop while running of the run i get this Error ad$ ./maker STATUS: Parsing control files... WARNING: 'max_dna_len' is set too low. The minimum value permited is 50,000. max_dna_len will be reset to 50,000 STATUS: Processing and indexing input FASTA files... HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size Filesize limit exceeded: 25 *my maker_opt* #-----Genome (these are always required) genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=10000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files On 11 April 2018 at 20:57, Carson Holt wrote: > The issue is with Berkley DB. BioPerl is using perl?s DB_File module to > index the fastas. > > 1. Make sure you do not have an extremely large number of reads in the > fasta files (i.e. mRNA-seq data which cannot be used directly as input to > MAKER, you must assemble it first into transcriptome contigs) > 2. Reinstall perl and compile against the newly installed BerkleyDB > libraries. > 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File > module. > > You can count reads in your fasta input using this command (replace > file.fasta) > > grep -c ?>? file.fasta > > If your counts are really high (i.e. higher than a few hundred thousand > maximum), then you have a data issue. You are either giving too much data > or the wrong data as input. > > ?Carson > > > > On Apr 11, 2018, at 11:39 AM, ohon Kin wrote: > > > hello ; Carson > > i really would appreciate your help im kind of having same issue > i get this Error when i run maker i assumed that it required big memory > space > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > while working 1T of my Hard-disc capacity seems not enough for maker > annotation > i think something wrong in my input data or the dependencies > would you please advice on the matter and elaborate solutions please > > i have install BerkleyDB using brew > > The input giving to Maker as followed : > Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> > then added it directly to maker for annotation > > do i have to apply these data pre-process before it applied to maker > > > > > > > > > On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: >> >> The FASTA file gets indexed by BioPerl using BerkleyDB. >> > > >> I?m guessing there is something odd about your input file and the >> database has run out of HASHes for indexing. >> > > >> You can google if there is a setting you can configure in BerkleyDB on >> Mac. >> > > >> But I suspect you are doing something like giving the raw reads from an >> mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of >> entrires to be indexed), which would be incorrect. MAKER can?t handle raw >> data. You must first assemble it using using like Trinity for example for >> mRNA. >> >> Thanks, >> Carson >> >> On Dec 7, 2017, at 8:53 AM, Scott Cain wrote: >> >> Hi Guinara, >> >> I don't know (though my guess would be that you're running out of >> memory). I'm cc'ing the MAKER developer's mailing list to see if anybody >> on that list knows. >> >> Scott >> >> >> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova > a.ca> wrote: >> >>> Hello, >>> >>> I got this error running maker on mac: >>> >>> STATUS: Parsing control files... >>> STATUS: Processing and indexing input FASTA files... >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> Filesize limit exceeded: 25 >>> >>> Is there anything that could solve it? >>> >>> Thank you, >>> Gulnara >>> >>> >>> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> maker-devel mailing list >> maker... at box290.bluehost. com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -- *Warning: *This message and its attachment, if any, are confidential and may contain information protected by law. If you are not the intended recipient, please contact the sender immediately and delete the message and its attachment, if any. You should not copy the message and its attachment, if any, or disclose its contents to any other person or use it for any purpose. Statements and opinions expressed in this e-mail and its attachment, if any, are those of the sender, and do not necessarily reflect those of kacst. accepts no liability for any damage caused by this email. *?????:* ??? ??????? ??? ????? ?? ?????? (?? ????) ???? ????? ???? ?? ????? ??? ??????? ????? ????? ???????. ??? ?? ??? ????? ?????? ???? ??????? ???? ???? ????? ??????? ???? ?????? ????? ???? ??????? ????????? (?? ????)? ??? ???? ?? ??? ?? ????? ??? ??????? ?? ???????? (?? ????) ?? ?? ??? ????? ?? ????? ?????????? ????? ?? ????????? ??? ???. ????? ??? ???? ??? ??????? ????????? (?? ????) ???? ?? ??? ??????? ???? ???????? ??? ????? ????? ?????????? ??? ????? ??????? ?? ??????? ?? ??????? ??????? ?? ?? ?? ?????? ??? ??????. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 11:12:32 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2018 10:12:32 -0600 Subject: [maker-devel] Private message regarding: MAKER run error In-Reply-To: References:

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> Message-ID: <2FAEC751-C8DF-48B9-8E8E-083E593A030D@gmail.com> The datasets do not look too large. The failure you are seeing is happening outside of MAKER. So there is something wrong on the system itself. You will probably have to reinstall perl against your local libraries especially if you reinstalled BerkleyDB. Or try downloading the latest stable release of Perl (it comes precompiled against static libraries - Berkeley DB version 1.x which can help avoid some issues). You will have to reinstall MAKER to use that version of Perl (MAKER uses the perl version used to call Build.PL during the install). If you are running on something like FreeBSD, it may just break Perl?s DB_File. Also this note from CPAN ?> Although DB_File is intended to be used with Berkeley DB version 1, it can also be used with version 2, 3 or 4. In this case the interface is limited to the functionality provided by Berkeley DB 1.x. If reinstalling tools does not work around your issue, you may just have to run on a different system. ?Carson > On Apr 15, 2018, at 8:34 AM, ohon Kin wrote: > > > > grep -c ">" Ca_kacst.fna > 32572 > > > the EST i have are assembled to contigs > grep -c ">" Ca_EST > 23602 > > > grep -c ">" Ca__protein.faa > 26729 > > these are my input-data i have reinstall perl as your instructions please have a look, the tool still 1T not enough will stop while running of the run > > i get this Error > ad$ ./maker > STATUS: Parsing control files... > WARNING: 'max_dna_len' is set too low. The minimum value permited is 50,000. > max_dna_len will be reset to 50,000 > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > > > my maker_opt > > > #-----Genome (these are always required) > genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm= #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=10000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > > On 11 April 2018 at 20:57, Carson Holt > wrote: > The issue is with Berkley DB. BioPerl is using perl?s DB_File module to index the fastas. > > 1. Make sure you do not have an extremely large number of reads in the fasta files (i.e. mRNA-seq data which cannot be used directly as input to MAKER, you must assemble it first into transcriptome contigs) > 2. Reinstall perl and compile against the newly installed BerkleyDB libraries. > 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File module. > > You can count reads in your fasta input using this command (replace file.fasta) > > grep -c ?>? file.fasta > > If your counts are really high (i.e. higher than a few hundred thousand maximum), then you have a data issue. You are either giving too much data or the wrong data as input. > > ?Carson > > > >> On Apr 11, 2018, at 11:39 AM, ohon Kin > wrote: >> >> >> hello ; Carson >> >> i really would appreciate your help im kind of having same issue >> i get this Error when i run maker i assumed that it required big memory space >> >> STATUS: Processing and indexing input FASTA files... >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> Filesize limit exceeded: 25 >> >> while working 1T of my Hard-disc capacity seems not enough for maker annotation >> i think something wrong in my input data or the dependencies >> would you please advice on the matter and elaborate solutions please >> >> i have install BerkleyDB using brew >> >> The input giving to Maker as followed : >> Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> then added it directly to maker for annotation >> >> do i have to apply these data pre-process before it applied to maker >> >> >> >> >> >> >> >> >> On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: >> The FASTA file gets indexed by BioPerl using BerkleyDB. >> >> I?m guessing there is something odd about your input file and the database has run out of HASHes for indexing. >> >> You can google if there is a setting you can configure in BerkleyDB on Mac. >> >> But I suspect you are doing something like giving the raw reads from an mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of entrires to be indexed), which would be incorrect. MAKER can?t handle raw data. You must first assemble it using using like Trinity for example for mRNA. >> >> Thanks, >> Carson >> >>> On Dec 7, 2017, at 8:53 AM, Scott Cain scottcain.net > wrote: >>> >>> Hi Guinara, >>> >>> I don't know (though my guess would be that you're running out of memory). I'm cc'ing the MAKER developer's mailing list to see if anybody on that list knows. >>> >>> Scott >>> >>> >>> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova ualberta.ca > wrote: >>> Hello, >>> >>> I got this error running maker on mac: >>> >>> STATUS: Parsing control files... >>> STATUS: Processing and indexing input FASTA files... >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> Filesize limit exceeded: 25 >>> >>> Is there anything that could solve it? >>> >>> Thank you, >>> Gulnara >>> >>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/ ) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> maker-devel mailing list >>> maker...@ <>box290.bluehost. com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Warning: This message and its attachment, if any, are confidential and may contain information protected by law. If you are not the intended recipient, please contact the sender immediately and delete the message and its attachment, if any. You should not copy the message and its attachment, if any, or disclose its contents to any other person or use it for any purpose. Statements and opinions expressed in this e-mail and its attachment, if any, are those of the sender, and do not necessarily reflect those of kacst. accepts no liability for any damage caused by this email. > ?????: ??? ??????? ??? ????? ?? ?????? (?? ????) ???? ????? ???? ?? ????? ??? ??????? ????? ????? ???????. ??? ?? ??? ????? ?????? ???? ??????? ???? ???? ????? ??????? ???? ?????? ????? ???? ??????? ????????? (?? ????)? ??? ???? ?? ??? ?? ????? ??? ??????? ?? ???????? (?? ????) ?? ?? ??? ????? ?? ????? ?????????? ????? ?? ????????? ??? ???. ????? ??? ???? ??? ??????? ????????? (?? ????) ???? ?? ??? ??????? ???? ???????? ??? ????? ????? ?????????? ??? ????? ??????? ?? ??????? ?? ??????? ??????? ?? ?? ?? ?????? ??? ??????. -------------- next part -------------- An HTML attachment was scrubbed... URL: From timo.metz at googlemail.com Wed Apr 18 07:35:34 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Wed, 18 Apr 2018 14:35:34 +0200 Subject: [maker-devel] Using PacBio and Illumina in MAKER Message-ID: Hey guys, I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Wed Apr 18 10:25:10 2018 From: dandence at gmail.com (Daniel Ence) Date: Wed, 18 Apr 2018 11:25:10 -0400 Subject: [maker-devel] Using PacBio and Illumina in MAKER In-Reply-To: References: Message-ID: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> Hi Timo, first of all, are these RNA or DNAseq reads? If they are DNA, then the best use would be to improve your reference assembly as much as possible. If they are RNAseq, then you want to do whatever kind of assembly you can (trinity for example for illumina) with the illumina reads and the PacBio reads separately. You can also use Evidence Modeler, which is compatible with more recent versions of MAKER, to assign weights to different datasets, so you can reflect the different confidence you have in your different datasets. ~Daniel > On Apr 18, 2018, at 8:35 AM, Timo Metz wrote: > > Hey guys, > > I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? > > best > Timo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Wed Apr 18 10:29:17 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Apr 2018 09:29:17 -0600 Subject: [maker-devel] Using PacBio and Illumina in MAKER In-Reply-To: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> References: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> Message-ID: <9A873862-252F-4A16-AD20-C85A389F616F@gmail.com> I would add that if you have a PacBio assembly, you can use programs like Pilon to polish the long read PacBio assembly (it uses the high accuracy Illimuna reads to correct errors in the PacBio assembly). I would at least do that rather than using the PacBio assembly directly. ?Carson > On Apr 18, 2018, at 9:25 AM, Daniel Ence wrote: > > Hi Timo, first of all, are these RNA or DNAseq reads? If they are DNA, then the best use would be to improve your reference assembly as much as possible. If they are RNAseq, then you want to do whatever kind of assembly you can (trinity for example for illumina) with the illumina reads and the PacBio reads separately. > > You can also use Evidence Modeler, which is compatible with more recent versions of MAKER, to assign weights to different datasets, so you can reflect the different confidence you have in your different datasets. > > ~Daniel > >> On Apr 18, 2018, at 8:35 AM, Timo Metz wrote: >> >> Hey guys, >> >> I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? >> >> best >> Timo >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Apr 18 10:31:38 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Apr 2018 09:31:38 -0600 Subject: [maker-devel] Using PacBio and Illumina in MAKER In-Reply-To: <9A873862-252F-4A16-AD20-C85A389F616F@gmail.com> References: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> <9A873862-252F-4A16-AD20-C85A389F616F@gmail.com> Message-ID: Relevant when using Pilon on mRNA-seq assemblies (you need to modify some command line options) ?> https://github.com/broadinstitute/pilon/issues/50 ?Carson > On Apr 18, 2018, at 9:29 AM, Carson Holt wrote: > > I would add that if you have a PacBio assembly, you can use programs like Pilon to polish the long read PacBio assembly (it uses the high accuracy Illimuna reads to correct errors in the PacBio assembly). I would at least do that rather than using the PacBio assembly directly. > > ?Carson > >> On Apr 18, 2018, at 9:25 AM, Daniel Ence wrote: >> >> Hi Timo, first of all, are these RNA or DNAseq reads? If they are DNA, then the best use would be to improve your reference assembly as much as possible. If they are RNAseq, then you want to do whatever kind of assembly you can (trinity for example for illumina) with the illumina reads and the PacBio reads separately. >> >> You can also use Evidence Modeler, which is compatible with more recent versions of MAKER, to assign weights to different datasets, so you can reflect the different confidence you have in your different datasets. >> >> ~Daniel >> >>> On Apr 18, 2018, at 8:35 AM, Timo Metz wrote: >>> >>> Hey guys, >>> >>> I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? >>> >>> best >>> Timo >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Thu Apr 19 04:15:21 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Thu, 19 Apr 2018 11:15:21 +0200 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: <6322C869-8807-48DC-AD20-7233C91AD68C@gmail.com> References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> <381F5EAB-2C0B-4DED-BD32-573D1B1C2B47@bils.se>

<02FBEB9A-E9E9-41F2-AC4E-E79A0CF56341@nbis.se> <76A072A1-B7F9-4133-8182-3E9A8DF94252@gmail.com> <1FB43157-6F2D-4B17-9A37-36DCB73830BA@nbis.se> <1948D4D5-C859-4D41-9428-C281BE0694AC@gmail.com> <6322C869-8807-48DC-AD20-7233C91AD68C@gmail.com> Message-ID: <8D16BED9-B28B-4C56-B698-310EFECDE877@nbis.se> Passing from Perl 5.10.1 to 5.16.3 seems to have fixed the substr outside of string at PhatHit_utils.pm line 850. issue. Thank you again for your help. /Jacques > On 17 Apr 2018, at 17:58, Carson Holt wrote: > > It runs fine to completion for me (on both 3.01.01 and 3.01.02). Since I?m using your output, no external tools are called, it just parses the reports already written in the directory and finishes. > > This suggests that any issue is either with your version of Perl or a component MAKER is using (such as BioPerl). I am using Perl 5.16.3 and BioPerl 1.007002 (the CPAN version). > > Note if you are using BioPerl live or the GitHub release it still shows version 1.007002 but will not necessarily match the CPAN version as the GitHub version counter does not get iterated with each commit. So make sure you are not accidentally using BioPerl live from GitHub (only use CPAN or let MAKER do the install of BioPerl if it?s not system wide). Also you are using Exonerate 2.4 instead of the stable 2.2 release. That shouldn?t make any difference since I am just parsing your output that is already in the folder and not running exonerate. But it may be worth an off chance look. > > Finally what you may want to do is download a new version of Perl, then install that and run MAKER with that version just to make sure Perl or something installed inside your Perl is not generating thee issue. > > I would try the current stable release of perl (since it comes all ready to go - no compiling needed). Alternatively you can also try perlbrew to get a specific version (but it will have to compile against local libraries). > > ?Carson > >> On Apr 17, 2018, at 6:54 AM, Jacques Dainat > wrote: >> >> I tried with the last version 3.01.02-beta, still the same error. >> >> I agree that it could be something wrong with the input GFF3 file, but I don?t find what it could be. >> >> I have loaded the whole folder as user guest_5602. >> >> I?m looking forward to hearing from you. >> >> /Jacques >> >>> On 16 Apr 2018, at 22:11, Carson Holt > wrote: >>> >>> I can?t replicate your failure. Normally this error indicates there is something wrong with the input GFF3 file. >>> >>> If you want, you can run this on your own machine to generate the failure, then tarball up the complete maker directory for the failure and upload it here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Also try the most current version of MAKER (3.01.02 - October 2017). See if it happens for you there. >>> >>> ?Carson >>> >>> >>> >>> >>>> On Apr 13, 2018, at 6:52 AM, Jacques Dainat > wrote: >>>> >>>> Dear Carson, >>>> >>>> I come back to you still with the same problem: substr outside of string at /sw/bioinfo/maker/3.01.1-beta-OMPI/bin/../lib/PhatHit_utils.pm line 850. >>>> Since our last conversation in January I have seen in the MAKER mailing list that one more person (seoanezonjic) had this issue. >>>> In January, the only way I found to avoid the problem was to remove the gff files that were related to the issue. >>>> >>>> I have again the problem for a new annotation project. On the 6 `EST` gff files I?m using (produced in the same way, with Stringtie and converted in gff3 alignment style), 2 of them are raising the error. >>>> To try to better get where the problem come from I have minimise the tools used within MAKER. So no repeat masking and abinitio tools activated, only protein in fasta format and one EST file in gff format. >>>> Using only the gff est file with est2genome=1 works >>>> Using only protein in fasta with protein2genome works >>>> Using the gff EST file and the protein in fasta format with protein2genome and est2genome or only protein2genome doesn?t work >>>> >>>> The problem occurs when protein alignments try to be extended by the EST information. >>>> I tried using the same tool versions as you (BioPerl 1.007002, BLAST+ 2.7.1, Exonerate 2.2.0) but still the same problem. >>>> One of the interesting thing is that the problem does not occur when I used the protein in gff format. >>>> >>>> Here is one gff file raising the error. If you want to try in the same conditions you will have to use/dowmload the swissprot database (uniprot rewieved only). >>>> >>>> >>>> <9529.1.136329.CGATGT.gff3> >>>> >>>> >>>> I hope we will finish to find a solution to this problem? >>>> >>>> Best regards, >>>> >>>> Jacques >>>> ------------------------------------------------- >>>> Jacques Dainat, Ph.D. >>>> NBIS (National Bioinformatics Infrastructure Sweden) >>>> Genome Annotation Service >>>> http://nbis.se/about/staff/jacques-dainat/ >>>> http://nbis.se >>>> >>>> Address: >>>> Uppsala University, Biomedicinska Centrum >>>> Department of Medical Biochemistry Microbiology, Genomics >>>> Husargatan 3, box 582 >>>> S-75123 Uppsala Sweden >>>> Phone: +46 18 471 46 25 >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ar834 at cornell.edu Fri Apr 20 09:47:21 2018 From: ar834 at cornell.edu (Aditi Rambani) Date: Fri, 20 Apr 2018 14:47:21 +0000 Subject: [maker-devel] August failing for one contig only during Maker run In-Reply-To: <0bb33aae-a9a8-4573-a743-3591de9b4208@Spark> References: <15BF15D1-835F-4E8E-9B52-A958EF19BCD9@gmail.com>, <0bb33aae-a9a8-4573-a743-3591de9b4208@Spark> Message-ID: Hello, Maker successfully annotated the genome but keeps failing for one contig with error message attached below. Can you please help me troubleshoot this ? There is no issue with augustus for rest of the contigs. Thank you Aditi #--------- command -------------# Widget::augustus: ~/augustus-3.2/bin/augustus --species=Cxxx --strand=forward --UTR=off --hintsfile=/tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.xdef.augustus --extrinsicCfgFile=/data/home/srs57/programs/augustus-3.2/config/extrinsic/extrinsic.MPE.cfg /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus.fasta > /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus #-------------------------------# Sampling error in intron model. state=14 base=4422 ~/augustus-3.2/bin/augustus: ERROR Tried to sample from empty list. Sampling error in intron model. state=14 base=4422 ~/augustus-3.2/bin/augustus: ERROR Tried to sample from empty list. ERROR: Augustus failed --> rank=NA, hostname=xxx ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:Contigxxx ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:Contigxxx examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 24 14:21:40 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Apr 2018 13:21:40 -0600 Subject: [maker-devel] August failing for one contig only during Maker run In-Reply-To: References: <15BF15D1-835F-4E8E-9B52-A958EF19BCD9@gmail.com> <0bb33aae-a9a8-4573-a743-3591de9b4208@Spark> Message-ID: The failure is internal to Augustus. Try updating Augustus to 3.3. If it still fails after that let me know. We will have to isolate the command and files used, so you can send it as a test dataset to the Augustus devlopers. ?Carson > On Apr 20, 2018, at 8:47 AM, Aditi Rambani wrote: > > Hello, > > Maker successfully annotated the genome but keeps failing for one contig with error message attached below. Can you please help me troubleshoot this ? There is no issue with augustus for rest of the contigs. > > Thank you > > Aditi > > #--------- command -------------# > Widget::augustus: > ~/augustus-3.2/bin/augustus --species=Cxxx --strand=forward --UTR=off --hintsfile=/tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.xdef.augustus --extrinsicCfgFile=/data/home/srs57/programs/augustus-3.2/config/extrinsic/extrinsic.MPE.cfg /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus.fasta > /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus > #-------------------------------# > Sampling error in intron model. state=14 base=4422 > > ~/augustus-3.2/bin/augustus: ERROR > Tried to sample from empty list. > > Sampling error in intron model. state=14 base=4422 > ~/augustus-3.2/bin/augustus: ERROR > Tried to sample from empty list. > > ERROR: Augustus failed > --> rank=NA, hostname=xxx > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:Contigxxx > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:Contigxxx > > examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiapeng.chen at sydney.edu.au Sat Apr 21 23:24:50 2018 From: jiapeng.chen at sydney.edu.au (Jiapeng Chen) Date: Sun, 22 Apr 2018 04:24:50 +0000 Subject: [maker-devel] Error in maker perl module Message-ID: Hi, Thank you so much for developing MAKER, which is such a huge project. I ran into an error message: Can't locate object method "first" via package "B::COP" at /usr/local/perl/5.20.1/lib/5.20.1/B/Deparse.pm line 3228. --> rank=NA, hostname=hpc182, at /usr/local/maker/2.31.8/maker/bin/../lib/Process/MpiChunk.pm line 4509. What would be the simplest solution for this bug? Cheers, Jiapeng From carsonhh at gmail.com Tue Apr 24 14:28:56 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Apr 2018 13:28:56 -0600 Subject: [maker-devel] Error in maker perl module In-Reply-To: References: Message-ID: The failure is from perl?s 'Storable module'. You may need to reinstall it as well as potentially broken dependencies (B::COP and B::Deparse). Do that via CPAN. You appear to be using a non-system perl (here ?> /usr/local/perl/5.20.1). That install may be broken, or you may have a mix of older system perl modules together with modules from your newer non-system installation. ?Carson > On Apr 21, 2018, at 10:24 PM, Jiapeng Chen wrote: > > Hi, > > Thank you so much for developing MAKER, which is such a huge project. > > I ran into an error message: > Can't locate object method "first" via package "B::COP" at /usr/local/perl/5.20.1/lib/5.20.1/B/Deparse.pm line 3228. > --> rank=NA, hostname=hpc182, at /usr/local/maker/2.31.8/maker/bin/../lib/Process/MpiChunk.pm line 4509. > What would be the simplest solution for this bug? > > Cheers, > Jiapeng > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qlian003 at ucr.edu Tue Apr 3 12:49:28 2018 From: qlian003 at ucr.edu (Qihua Liang) Date: Tue, 3 Apr 2018 11:49:28 -0700 Subject: [maker-devel] exon names in gff file Message-ID: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> Dear Maker development team, I find in the gff file of exon annotation, it looks like: ctg631 maker exon 16239 16243 . - . ID=pInfestans_00016306-RA:exon:96;Parent=pInfestans_00016306-RA; I am wondering what does ?96? mean in ID=pInfestans_00016306-RA:exon:96, it does not look like the exon numbering because not all transcripts have an exon starts from 0. Thank you Qihua From carsonhh at gmail.com Tue Apr 3 13:24:56 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 3 Apr 2018 13:24:56 -0600 Subject: [maker-devel] exon names in gff file In-Reply-To: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> References: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> Message-ID: <9A0C4A3C-DE99-4599-B867-A4A7854EA5AB@gmail.com> It?s just an iterator to ensure the ID attribute is unique for proper parent/child feature reconstruction. It?s required on the computation side and is meaningless biologically. ?Carson > On Apr 3, 2018, at 12:49 PM, Qihua Liang wrote: > > Dear Maker development team, > > I find in the gff file of exon annotation, it looks like: > ctg631 maker exon 16239 16243 . - . ID=pInfestans_00016306-RA:exon:96;Parent=pInfestans_00016306-RA; > > I am wondering what does ?96? mean in ID=pInfestans_00016306-RA:exon:96, it does not look like the exon numbering because not all transcripts have an exon starts from 0. > > Thank you > Qihua > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Apr 6 09:40:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Apr 2018 09:40:14 -0600 Subject: [maker-devel] SNAP bootstrap training Message-ID: More than 2 total trading rounds can generate what is known as the overtraining trap. So I rarely do more than one round of bootstrapping with SNAP. To evaluate the models, look at them in a browser. If the raw models are similar to the final hint based models, then SNAP is well trained. If not then SNAP is poorly trained. Don?t use final models directly to evaluate training. Rather look at the raw models. They are what are made direct from the HMM. A well trained predictor will perform similarly even outside if MAKER. If it?s over predicting on its own, you may need to filter or even manually curate a subset of models from the initial training round to get better bootstrap training. Also if you did not build a species specific repeat library, you may be under masking and essentially training SNAP to find transposons with the bootstrapping. ?Carson Sent from my iPhone > On Apr 6, 2018, at 7:23 AM, Timo Metz wrote: > > Hello, > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > thanks in advance > > kind regards > Timo Metz From timo.metz at googlemail.com Fri Apr 6 07:23:29 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 6 Apr 2018 15:23:29 +0200 Subject: [maker-devel] SNAP bootstrap training Message-ID: Hello, I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? thanks in advance kind regards Timo Metz -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngrundma at uni-muenster.de Sat Apr 7 04:38:46 2018 From: ngrundma at uni-muenster.de (Norbert Grundmann) Date: Sat, 7 Apr 2018 12:38:46 +0200 Subject: [maker-devel] problems running maker 2.31.9 Message-ID: Hello, I succesfully installed maker version 2.31.9 on my FreeBSD 10.3 Server.? So far only minor things had to be done.? But - what does following mean? # maker *Thread server rejected connection: 192.168.1.3:29786 does not match allowed IP mask* The thing is that the maker process is running in a "container" (jail) with the mentioned ip adress - which is "natted" to the outside.? is there any chance to run it? Thank you, Norbert Grundmann -- Norbert Grundmann Inst. of Bioinformatics Muenster Niels Stensen Strasse 14 48149 Muenster / Germany Tel. 0251 - 83 53 007 (Use *BSD, because Linux is a patch for Linux) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Tue Apr 10 09:15:51 2018 From: dandence at gmail.com (Daniel Ence) Date: Tue, 10 Apr 2018 11:15:51 -0400 Subject: [maker-devel] SNAP bootstrap training In-Reply-To: References: Message-ID: Hi, what evidence are you using to get AEDs for the results of your bootstrap training? I don?t find it surprising that the AEDs get worse in subsequent rounds of bootstrap training since overtraining is a real possibility when training ab initio predictors. 300 genes also might not be enough genes, since I think the tutorials and protocols here and here use 1000 genes for training SNAP. I do find it surprising that training file from a different organism gives models that match evidence from your organism of interest. Is that correct? ~Daniel > On Apr 6, 2018, at 9:23 AM, Timo Metz wrote: > > Hello, > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > thanks in advance > > kind regards > Timo Metz > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 10 09:23:02 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2018 09:23:02 -0600 Subject: [maker-devel] SNAP bootstrap training In-Reply-To: References: Message-ID: If there is something in the assembly (broken ORF, altered splice site, or small string of N?s - very common in new assemblies) the gene predictor will alter splicing and intron/exon patterns to get around it. The issue is almost always in the assembly. Also if you are not masking repeats (i.e. did not build a species specific library), it will introduce ORFs from transposons that will confuse gene predictors. Finally some predictors don?t work well on some organisms. SNAP has trouble with many vertebrate species for example. A higher quality dataset of ~300 is good enough for training. If you have more (500-1000), most protocols have you split the dataset into a training set and a test set to evaluate sensitivity/specificity using tools like Eval from WashU (i.e. you train on half then predict on the other half to see if the predictions match the models). ?Carson > On Apr 9, 2018, at 5:55 AM, Timo Metz wrote: > > Hey Carson, > > thanks for your advice. Would you then rather go for a little set of genes with high quality or rather more genes to feed into MAKER for the training? > > And I have another question, which is rather not directly related to this topic but I hope that you might still answer: It seems sometimes as if the hint-based prediction does not work sufficient. I can clearly find examples where maker infers gene models directly from a prediction even though the evidence does totally indicate something different and the gene model is probably wrong then (as I even find that cases when looking at highly conserved regions where I actually now the structure the gene should have). > > best > Timo > > > 2018-04-06 17:40 GMT+02:00 Carson Holt >: > More than 2 total trading rounds can generate what is known as the overtraining trap. So I rarely do more than one round of bootstrapping with SNAP. To evaluate the models, look at them in a browser. If the raw models are similar to the final hint based models, then SNAP is well trained. If not then SNAP is poorly trained. Don?t use final models directly to evaluate training. Rather look at the raw models. They are what are made direct from the HMM. A well trained predictor will perform similarly even outside if MAKER. If it?s over predicting on its own, you may need to filter or even manually curate a subset of models from the initial training round to get better bootstrap training. Also if you did not build a species specific repeat library, you may be under masking and essentially training SNAP to find transposons with the bootstrapping. > > ?Carson > > Sent from my iPhone > > > On Apr 6, 2018, at 7:23 AM, Timo Metz > wrote: > > > > Hello, > > > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > > > thanks in advance > > > > kind regards > > Timo Metz > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 10:38:59 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2018 10:38:59 -0600 Subject: [maker-devel] problems running maker 2.31.9 In-Reply-To: References: Message-ID: If you are running with MPI, you may need to test different MPI configurations and settings. For example if it is running on a single machine (not cross machine MPI) you can manually specify the host as localhost. ?Carson > On Apr 7, 2018, at 4:38 AM, Norbert Grundmann wrote: > > Hello, > > I succesfully installed maker version 2.31.9 on my FreeBSD 10.3 Server. So far only minor things had to be done. But - what does following mean? > # maker > Thread server rejected connection: 192.168.1.3:29786 does not match allowed IP mask > The thing is that the maker process is running in a "container" (jail) with the mentioned ip adress - which is "natted" to the outside. is there any chance to run it? > > Thank you, Norbert Grundmann > > -- > Norbert Grundmann > Inst. of Bioinformatics Muenster > Niels Stensen Strasse 14 > 48149 Muenster / Germany > Tel. 0251 - 83 53 007 > (Use *BSD, because Linux is a patch for Linux) > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Apr 11 10:36:29 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Wed, 11 Apr 2018 16:36:29 +0000 Subject: [maker-devel] MAKER start with masked genome In-Reply-To: <7EEA2C96-C177-4B43-979D-F105DCDAA1CA@umail.utah.edu> References: <900B3D58-5EF1-4410-B666-B68E479F8BB8@uni-muenster.de> <7EEA2C96-C177-4B43-979D-F105DCDAA1CA@umail.utah.edu> Message-ID: <597DA35A-0D01-45CD-A992-FD7B95D85B54@genetics.utah.edu> There are two ways. First rerun in the same working directory. MAKER will reuse previous repeat masking files as long as repeat masking settings did not change between runs. Second, if you have a genome wide GFF3 from the previous run, you can pass it in as maker_gff and set the appropriate pass=1 option underneath for repeats. ?Carson Sent from my iPhone > On Apr 11, 2018, at 6:22 AM, Mark Yandell wrote: > > > > On 4/11/18, 5:08 AM, "Jonas Bohn" wrote: > > Dear MAKER developers, > > I`m a master student in a bioinformatics group of university of muenster and I want to use MAKER for genome Annotation of an Ant genome. I ran RepeatMasker before and it took some days to get a masked genome. So I try to save some time for my master thesis. My question is: Is there an option to run MAKER2 without running RepeatMasker again (skip the RepeatMasker step)? > > I`m looking forward to hearing from you. > > Best regards, > > Jonas Bohn > > MSc. Student > Evolutionary Bioinformatics > University of Muenster, Germany > From carsonhh at gmail.com Wed Apr 11 11:57:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Apr 2018 11:57:36 -0600 Subject: [maker-devel] Private message regarding: MAKER run error In-Reply-To: References:

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> Message-ID: <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> The issue is with Berkley DB. BioPerl is using perl?s DB_File module to index the fastas. 1. Make sure you do not have an extremely large number of reads in the fasta files (i.e. mRNA-seq data which cannot be used directly as input to MAKER, you must assemble it first into transcriptome contigs) 2. Reinstall perl and compile against the newly installed BerkleyDB libraries. 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File module. You can count reads in your fasta input using this command (replace file.fasta) grep -c ?>? file.fasta If your counts are really high (i.e. higher than a few hundred thousand maximum), then you have a data issue. You are either giving too much data or the wrong data as input. ?Carson > On Apr 11, 2018, at 11:39 AM, ohon Kin wrote: > > > hello ; Carson > > i really would appreciate your help im kind of having same issue > i get this Error when i run maker i assumed that it required big memory space > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > while working 1T of my Hard-disc capacity seems not enough for maker annotation > i think something wrong in my input data or the dependencies > would you please advice on the matter and elaborate solutions please > > i have install BerkleyDB using brew > > The input giving to Maker as followed : > Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> then added it directly to maker for annotation > > do i have to apply these data pre-process before it applied to maker > > > > > > > > > On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: > The FASTA file gets indexed by BioPerl using BerkleyDB. > > I?m guessing there is something odd about your input file and the database has run out of HASHes for indexing. > > You can google if there is a setting you can configure in BerkleyDB on Mac. > > But I suspect you are doing something like giving the raw reads from an mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of entrires to be indexed), which would be incorrect. MAKER can?t handle raw data. You must first assemble it using using like Trinity for example for mRNA. > > Thanks, > Carson > >> On Dec 7, 2017, at 8:53 AM, Scott Cain scottcain.net > wrote: >> >> Hi Guinara, >> >> I don't know (though my guess would be that you're running out of memory). I'm cc'ing the MAKER developer's mailing list to see if anybody on that list knows. >> >> Scott >> >> >> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova ualberta.ca > wrote: >> Hello, >> >> I got this error running maker on mac: >> >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> Filesize limit exceeded: 25 >> >> Is there anything that could solve it? >> >> Thank you, >> Gulnara >> >> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/ ) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> maker-devel mailing list >> maker...@ <>box290.bluehost. com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nellerk at yorku.ca Thu Apr 12 11:12:12 2018 From: nellerk at yorku.ca (nellerk at yorku.ca) Date: Thu, 12 Apr 2018 17:12:12 +0000 Subject: [maker-devel] evidence-only gene annotation Message-ID: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> Hello, I am using Maker to annotate a novel, non-model plant genome. Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. I have two questions:1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way.2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library. Thank you for any assistance! Kira -------------- next part -------------- An HTML attachment was scrubbed... URL: From aejysselansie at gmail.com Fri Apr 13 00:35:08 2018 From: aejysselansie at gmail.com (Ansie Yssel) Date: Fri, 13 Apr 2018 08:35:08 +0200 Subject: [maker-devel] Maker, no fasta files in output In-Reply-To: References: Message-ID: Dear Carson I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? I am trying to annotate a newly sequenced genome. I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. The input that I used for generating the gene models (before training SNAP) was: the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. softmasking was set to 1 That output was used to train snap. Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. I trained snap for a second time. That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. Input for Basic Protocol 1 was: The snap hmm file my unmasked genome the species specific repeat library RNAseq evidence the protein evidence from a close relative softmasking was set to 1 est2genome=0 protein2genome=0 I collected the results as outlined on page 5 of the article. However I noticed that there were no Fasta files. Do you have any idea what could have gone wrong? Can I send my log files to you? Thanks in advance for any assistance. Kind Regards Anna Yssel On 12 April 2018 at 10:16, Ansie Yssel wrote: > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new > topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated > with REPET, the unmasked genome, and proteins from a "closley" related > species (actually not that close, but my species is the only one in its > genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article > "Genome annotation and Curation using MAKER and MAKER-P" published in Curr > Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training > SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and > protein2genome=1. I also used Repeat masking and included my species > specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using > the hmm file as input (and setting est2genome=0 and protein2genome=0). > Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the > aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > > > Virus-free. > www.avast.com > > <#m_4251738625593252965_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- Kind Regards A Yssel Centre of Microbial and Plant Genetics KU Leuven Faculteit Bio-ingenieurswetenschappen Kasteelpark Arenberg 20, bus 2460 B-3001 Heverlee Belgium -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 16 12:21:22 2018 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Apr 2018 12:21:22 -0600 Subject: [maker-devel] Maker, no fasta files in output In-Reply-To: References: Message-ID: <89921910-7CB1-4481-8973-F3E7DF4E688D@gmail.com> Hi Anna, The lack of results means you either had no results from SNAP or no evidence supporting results in your run. You can check for SNAP results just by looking for snap_masked features in the GFF3. For evidence, make sure you still provided the protein= and est= files even though you tunred off est2genome/protein2genome. ?Carson > On Apr 13, 2018, at 12:35 AM, Ansie Yssel wrote: > > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > On 12 April 2018 at 10:16, Ansie Yssel > wrote: > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > > Virus-free. www.avast.com

> > > -- > Kind Regards > A Yssel > > Centre of Microbial and Plant Genetics > KU Leuven > Faculteit Bio-ingenieurswetenschappen > Kasteelpark Arenberg 20, bus 2460 > B-3001 Heverlee > Belgium > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 16 12:26:06 2018 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Apr 2018 12:26:06 -0600 Subject: [maker-devel] evidence-only gene annotation In-Reply-To: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> References: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> Message-ID: Fusions are generated by the evidence alignments. Either transcript assemblies wee falsely fused or proteins are bridging neighboring paralogs. For transcript data you can try building the assembly with Trinity and the jaccard_index option which will reduce the occurrence of transcript assembly fusion. Also set correct_est_fusion=1 in the options files. For protein evidence driven fusions, you can try DeFusion which is a post process you run on the MAKER output that will search and attempt top correct for paralog driven fusions. ?Carson > On Apr 12, 2018, at 11:12 AM, nellerk at yorku.ca wrote: > > Hello, > > I am using Maker to annotate a novel, non-model plant genome. > > Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. > > I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). > > So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. > > I have two questions: > 1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way. > 2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library. > > Thank you for any assistance! > > Kira > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 09:58:16 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2018 09:58:16 -0600 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> <381F5EAB-2C0B-4DED-BD32-573D1B1C2B47@bils.se>

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> Message-ID: grep -c ">" Ca_kacst.fna 32572 the EST i have are assembled to contigs grep -c ">" Ca_EST 23602 grep -c ">" Ca__protein.faa 26729 these are my input-data i have reinstall perl as your instructions please have a look, the tool still 1T not enough will stop while running of the run i get this Error ad$ ./maker STATUS: Parsing control files... WARNING: 'max_dna_len' is set too low. The minimum value permited is 50,000. max_dna_len will be reset to 50,000 STATUS: Processing and indexing input FASTA files... HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size Filesize limit exceeded: 25 *my maker_opt* #-----Genome (these are always required) genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=10000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files On 11 April 2018 at 20:57, Carson Holt wrote: > The issue is with Berkley DB. BioPerl is using perl?s DB_File module to > index the fastas. > > 1. Make sure you do not have an extremely large number of reads in the > fasta files (i.e. mRNA-seq data which cannot be used directly as input to > MAKER, you must assemble it first into transcriptome contigs) > 2. Reinstall perl and compile against the newly installed BerkleyDB > libraries. > 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File > module. > > You can count reads in your fasta input using this command (replace > file.fasta) > > grep -c ?>? file.fasta > > If your counts are really high (i.e. higher than a few hundred thousand > maximum), then you have a data issue. You are either giving too much data > or the wrong data as input. > > ?Carson > > > > On Apr 11, 2018, at 11:39 AM, ohon Kin wrote: > > > hello ; Carson > > i really would appreciate your help im kind of having same issue > i get this Error when i run maker i assumed that it required big memory > space > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > while working 1T of my Hard-disc capacity seems not enough for maker > annotation > i think something wrong in my input data or the dependencies > would you please advice on the matter and elaborate solutions please > > i have install BerkleyDB using brew > > The input giving to Maker as followed : > Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> > then added it directly to maker for annotation > > do i have to apply these data pre-process before it applied to maker > > > > > > > > > On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: >> >> The FASTA file gets indexed by BioPerl using BerkleyDB. >> > > >> I?m guessing there is something odd about your input file and the >> database has run out of HASHes for indexing. >> > > >> You can google if there is a setting you can configure in BerkleyDB on >> Mac. >> > > >> But I suspect you are doing something like giving the raw reads from an >> mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of >> entrires to be indexed), which would be incorrect. MAKER can?t handle raw >> data. You must first assemble it using using like Trinity for example for >> mRNA. >> >> Thanks, >> Carson >> >> On Dec 7, 2017, at 8:53 AM, Scott Cain wrote: >> >> Hi Guinara, >> >> I don't know (though my guess would be that you're running out of >> memory). I'm cc'ing the MAKER developer's mailing list to see if anybody >> on that list knows. >> >> Scott >> >> >> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova > a.ca> wrote: >> >>> Hello, >>> >>> I got this error running maker on mac: >>> >>> STATUS: Parsing control files... >>> STATUS: Processing and indexing input FASTA files... >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> Filesize limit exceeded: 25 >>> >>> Is there anything that could solve it? >>> >>> Thank you, >>> Gulnara >>> >>> >>> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> maker-devel mailing list >> maker... at box290.bluehost. com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -- *Warning: *This message and its attachment, if any, are confidential and may contain information protected by law. If you are not the intended recipient, please contact the sender immediately and delete the message and its attachment, if any. You should not copy the message and its attachment, if any, or disclose its contents to any other person or use it for any purpose. Statements and opinions expressed in this e-mail and its attachment, if any, are those of the sender, and do not necessarily reflect those of kacst. accepts no liability for any damage caused by this email. *?????:* ??? ??????? ??? ????? ?? ?????? (?? ????) ???? ????? ???? ?? ????? ??? ??????? ????? ????? ???????. ??? ?? ??? ????? ?????? ???? ??????? ???? ???? ????? ??????? ???? ?????? ????? ???? ??????? ????????? (?? ????)? ??? ???? ?? ??? ?? ????? ??? ??????? ?? ???????? (?? ????) ?? ?? ??? ????? ?? ????? ?????????? ????? ?? ????????? ??? ???. ????? ??? ???? ??? ??????? ????????? (?? ????) ???? ?? ??? ??????? ???? ???????? ??? ????? ????? ?????????? ??? ????? ??????? ?? ??????? ?? ??????? ??????? ?? ?? ?? ?????? ??? ??????. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 10:12:32 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2018 10:12:32 -0600 Subject: [maker-devel] Private message regarding: MAKER run error In-Reply-To: References:

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> Message-ID: <2FAEC751-C8DF-48B9-8E8E-083E593A030D@gmail.com> The datasets do not look too large. The failure you are seeing is happening outside of MAKER. So there is something wrong on the system itself. You will probably have to reinstall perl against your local libraries especially if you reinstalled BerkleyDB. Or try downloading the latest stable release of Perl (it comes precompiled against static libraries - Berkeley DB version 1.x which can help avoid some issues). You will have to reinstall MAKER to use that version of Perl (MAKER uses the perl version used to call Build.PL during the install). If you are running on something like FreeBSD, it may just break Perl?s DB_File. Also this note from CPAN ?> Although DB_File is intended to be used with Berkeley DB version 1, it can also be used with version 2, 3 or 4. In this case the interface is limited to the functionality provided by Berkeley DB 1.x. If reinstalling tools does not work around your issue, you may just have to run on a different system. ?Carson > On Apr 15, 2018, at 8:34 AM, ohon Kin wrote: > > > > grep -c ">" Ca_kacst.fna > 32572 > > > the EST i have are assembled to contigs > grep -c ">" Ca_EST > 23602 > > > grep -c ">" Ca__protein.faa > 26729 > > these are my input-data i have reinstall perl as your instructions please have a look, the tool still 1T not enough will stop while running of the run > > i get this Error > ad$ ./maker > STATUS: Parsing control files... > WARNING: 'max_dna_len' is set too low. The minimum value permited is 50,000. > max_dna_len will be reset to 50,000 > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > > > my maker_opt > > > #-----Genome (these are always required) > genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff= #MAKER derived GFF3 file > est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa #protein sequence file in fasta format (i.e. from mutiple oransisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm= #SNAP HMM file > gmhmm= #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=10000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=0 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > > On 11 April 2018 at 20:57, Carson Holt > wrote: > The issue is with Berkley DB. BioPerl is using perl?s DB_File module to index the fastas. > > 1. Make sure you do not have an extremely large number of reads in the fasta files (i.e. mRNA-seq data which cannot be used directly as input to MAKER, you must assemble it first into transcriptome contigs) > 2. Reinstall perl and compile against the newly installed BerkleyDB libraries. > 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File module. > > You can count reads in your fasta input using this command (replace file.fasta) > > grep -c ?>? file.fasta > > If your counts are really high (i.e. higher than a few hundred thousand maximum), then you have a data issue. You are either giving too much data or the wrong data as input. > > ?Carson > > > >> On Apr 11, 2018, at 11:39 AM, ohon Kin > wrote: >> >> >> hello ; Carson >> >> i really would appreciate your help im kind of having same issue >> i get this Error when i run maker i assumed that it required big memory space >> >> STATUS: Processing and indexing input FASTA files... >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> Filesize limit exceeded: 25 >> >> while working 1T of my Hard-disc capacity seems not enough for maker annotation >> i think something wrong in my input data or the dependencies >> would you please advice on the matter and elaborate solutions please >> >> i have install BerkleyDB using brew >> >> The input giving to Maker as followed : >> Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> then added it directly to maker for annotation >> >> do i have to apply these data pre-process before it applied to maker >> >> >> >> >> >> >> >> >> On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: >> The FASTA file gets indexed by BioPerl using BerkleyDB. >> >> I?m guessing there is something odd about your input file and the database has run out of HASHes for indexing. >> >> You can google if there is a setting you can configure in BerkleyDB on Mac. >> >> But I suspect you are doing something like giving the raw reads from an mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of entrires to be indexed), which would be incorrect. MAKER can?t handle raw data. You must first assemble it using using like Trinity for example for mRNA. >> >> Thanks, >> Carson >> >>> On Dec 7, 2017, at 8:53 AM, Scott Cain scottcain.net > wrote: >>> >>> Hi Guinara, >>> >>> I don't know (though my guess would be that you're running out of memory). I'm cc'ing the MAKER developer's mailing list to see if anybody on that list knows. >>> >>> Scott >>> >>> >>> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova ualberta.ca > wrote: >>> Hello, >>> >>> I got this error running maker on mac: >>> >>> STATUS: Parsing control files... >>> STATUS: Processing and indexing input FASTA files... >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> Filesize limit exceeded: 25 >>> >>> Is there anything that could solve it? >>> >>> Thank you, >>> Gulnara >>> >>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain dot net >>> GMOD Coordinator (http://gmod.org/ ) 216-392-3087 >>> Ontario Institute for Cancer Research >>> _______________________________________________ >>> maker-devel mailing list >>> maker...@ <>box290.bluehost. com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Warning: This message and its attachment, if any, are confidential and may contain information protected by law. If you are not the intended recipient, please contact the sender immediately and delete the message and its attachment, if any. You should not copy the message and its attachment, if any, or disclose its contents to any other person or use it for any purpose. Statements and opinions expressed in this e-mail and its attachment, if any, are those of the sender, and do not necessarily reflect those of kacst. accepts no liability for any damage caused by this email. > ?????: ??? ??????? ??? ????? ?? ?????? (?? ????) ???? ????? ???? ?? ????? ??? ??????? ????? ????? ???????. ??? ?? ??? ????? ?????? ???? ??????? ???? ???? ????? ??????? ???? ?????? ????? ???? ??????? ????????? (?? ????)? ??? ???? ?? ??? ?? ????? ??? ??????? ?? ???????? (?? ????) ?? ?? ??? ????? ?? ????? ?????????? ????? ?? ????????? ??? ???. ????? ??? ???? ??? ??????? ????????? (?? ????) ???? ?? ??? ??????? ???? ???????? ??? ????? ????? ?????????? ??? ????? ??????? ?? ??????? ?? ??????? ??????? ?? ?? ?? ?????? ??? ??????. -------------- next part -------------- An HTML attachment was scrubbed... URL: From timo.metz at googlemail.com Wed Apr 18 06:35:34 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Wed, 18 Apr 2018 14:35:34 +0200 Subject: [maker-devel] Using PacBio and Illumina in MAKER Message-ID: Hey guys, I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? best Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Wed Apr 18 09:25:10 2018 From: dandence at gmail.com (Daniel Ence) Date: Wed, 18 Apr 2018 11:25:10 -0400 Subject: [maker-devel] Using PacBio and Illumina in MAKER In-Reply-To: References: Message-ID: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> Hi Timo, first of all, are these RNA or DNAseq reads? If they are DNA, then the best use would be to improve your reference assembly as much as possible. If they are RNAseq, then you want to do whatever kind of assembly you can (trinity for example for illumina) with the illumina reads and the PacBio reads separately. You can also use Evidence Modeler, which is compatible with more recent versions of MAKER, to assign weights to different datasets, so you can reflect the different confidence you have in your different datasets. ~Daniel > On Apr 18, 2018, at 8:35 AM, Timo Metz wrote: > > Hey guys, > > I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? > > best > Timo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Wed Apr 18 09:29:17 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Apr 2018 09:29:17 -0600 Subject: [maker-devel] Using PacBio and Illumina in MAKER In-Reply-To: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> References: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> Message-ID: <9A873862-252F-4A16-AD20-C85A389F616F@gmail.com> I would add that if you have a PacBio assembly, you can use programs like Pilon to polish the long read PacBio assembly (it uses the high accuracy Illimuna reads to correct errors in the PacBio assembly). I would at least do that rather than using the PacBio assembly directly. ?Carson > On Apr 18, 2018, at 9:25 AM, Daniel Ence wrote: > > Hi Timo, first of all, are these RNA or DNAseq reads? If they are DNA, then the best use would be to improve your reference assembly as much as possible. If they are RNAseq, then you want to do whatever kind of assembly you can (trinity for example for illumina) with the illumina reads and the PacBio reads separately. > > You can also use Evidence Modeler, which is compatible with more recent versions of MAKER, to assign weights to different datasets, so you can reflect the different confidence you have in your different datasets. > > ~Daniel > >> On Apr 18, 2018, at 8:35 AM, Timo Metz wrote: >> >> Hey guys, >> >> I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? >> >> best >> Timo >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Apr 18 09:31:38 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 18 Apr 2018 09:31:38 -0600 Subject: [maker-devel] Using PacBio and Illumina in MAKER In-Reply-To: <9A873862-252F-4A16-AD20-C85A389F616F@gmail.com> References: <20DBFB4E-4BAB-4E0A-9FF7-54CC1C6621F4@gmail.com> <9A873862-252F-4A16-AD20-C85A389F616F@gmail.com> Message-ID: Relevant when using Pilon on mRNA-seq assemblies (you need to modify some command line options) ?> https://github.com/broadinstitute/pilon/issues/50 ?Carson > On Apr 18, 2018, at 9:29 AM, Carson Holt wrote: > > I would add that if you have a PacBio assembly, you can use programs like Pilon to polish the long read PacBio assembly (it uses the high accuracy Illimuna reads to correct errors in the PacBio assembly). I would at least do that rather than using the PacBio assembly directly. > > ?Carson > >> On Apr 18, 2018, at 9:25 AM, Daniel Ence wrote: >> >> Hi Timo, first of all, are these RNA or DNAseq reads? If they are DNA, then the best use would be to improve your reference assembly as much as possible. If they are RNAseq, then you want to do whatever kind of assembly you can (trinity for example for illumina) with the illumina reads and the PacBio reads separately. >> >> You can also use Evidence Modeler, which is compatible with more recent versions of MAKER, to assign weights to different datasets, so you can reflect the different confidence you have in your different datasets. >> >> ~Daniel >> >>> On Apr 18, 2018, at 8:35 AM, Timo Metz wrote: >>> >>> Hey guys, >>> >>> I was wondering on what would be the best way to implement Pacbio long and assembled Illumina short reads into MAKER. PacBio reads have a higher confidence to find correct gene models as they do not need to be assembled, but I do not have enough PacBio reads available to construct an annotation solely based on PacBio reads. I also have tons of short reads available, but those are really short (30-40bp) so they are not very reliable. Is it a good idea to first do an annotation only with PacBio reads and Protein data and then do a "re-annotation" with Illumina reads in order to only identify "new" models that could be introduced by Illumina reads but let the old models intact? Are there any other suggestions or experiences? >>> >>> best >>> Timo >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at nbis.se Thu Apr 19 03:15:21 2018 From: jacques.dainat at nbis.se (Jacques Dainat) Date: Thu, 19 Apr 2018 11:15:21 +0200 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: <6322C869-8807-48DC-AD20-7233C91AD68C@gmail.com> References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> <381F5EAB-2C0B-4DED-BD32-573D1B1C2B47@bils.se>

<02FBEB9A-E9E9-41F2-AC4E-E79A0CF56341@nbis.se> <76A072A1-B7F9-4133-8182-3E9A8DF94252@gmail.com> <1FB43157-6F2D-4B17-9A37-36DCB73830BA@nbis.se> <1948D4D5-C859-4D41-9428-C281BE0694AC@gmail.com> <6322C869-8807-48DC-AD20-7233C91AD68C@gmail.com> Message-ID: <8D16BED9-B28B-4C56-B698-310EFECDE877@nbis.se> Passing from Perl 5.10.1 to 5.16.3 seems to have fixed the substr outside of string at PhatHit_utils.pm line 850. issue. Thank you again for your help. /Jacques > On 17 Apr 2018, at 17:58, Carson Holt wrote: > > It runs fine to completion for me (on both 3.01.01 and 3.01.02). Since I?m using your output, no external tools are called, it just parses the reports already written in the directory and finishes. > > This suggests that any issue is either with your version of Perl or a component MAKER is using (such as BioPerl). I am using Perl 5.16.3 and BioPerl 1.007002 (the CPAN version). > > Note if you are using BioPerl live or the GitHub release it still shows version 1.007002 but will not necessarily match the CPAN version as the GitHub version counter does not get iterated with each commit. So make sure you are not accidentally using BioPerl live from GitHub (only use CPAN or let MAKER do the install of BioPerl if it?s not system wide). Also you are using Exonerate 2.4 instead of the stable 2.2 release. That shouldn?t make any difference since I am just parsing your output that is already in the folder and not running exonerate. But it may be worth an off chance look. > > Finally what you may want to do is download a new version of Perl, then install that and run MAKER with that version just to make sure Perl or something installed inside your Perl is not generating thee issue. > > I would try the current stable release of perl (since it comes all ready to go - no compiling needed). Alternatively you can also try perlbrew to get a specific version (but it will have to compile against local libraries). > > ?Carson > >> On Apr 17, 2018, at 6:54 AM, Jacques Dainat > wrote: >> >> I tried with the last version 3.01.02-beta, still the same error. >> >> I agree that it could be something wrong with the input GFF3 file, but I don?t find what it could be. >> >> I have loaded the whole folder as user guest_5602. >> >> I?m looking forward to hearing from you. >> >> /Jacques >> >>> On 16 Apr 2018, at 22:11, Carson Holt > wrote: >>> >>> I can?t replicate your failure. Normally this error indicates there is something wrong with the input GFF3 file. >>> >>> If you want, you can run this on your own machine to generate the failure, then tarball up the complete maker directory for the failure and upload it here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >>> >>> Also try the most current version of MAKER (3.01.02 - October 2017). See if it happens for you there. >>> >>> ?Carson >>> >>> >>> >>> >>>> On Apr 13, 2018, at 6:52 AM, Jacques Dainat > wrote: >>>> >>>> Dear Carson, >>>> >>>> I come back to you still with the same problem: substr outside of string at /sw/bioinfo/maker/3.01.1-beta-OMPI/bin/../lib/PhatHit_utils.pm line 850. >>>> Since our last conversation in January I have seen in the MAKER mailing list that one more person (seoanezonjic) had this issue. >>>> In January, the only way I found to avoid the problem was to remove the gff files that were related to the issue. >>>> >>>> I have again the problem for a new annotation project. On the 6 `EST` gff files I?m using (produced in the same way, with Stringtie and converted in gff3 alignment style), 2 of them are raising the error. >>>> To try to better get where the problem come from I have minimise the tools used within MAKER. So no repeat masking and abinitio tools activated, only protein in fasta format and one EST file in gff format. >>>> Using only the gff est file with est2genome=1 works >>>> Using only protein in fasta with protein2genome works >>>> Using the gff EST file and the protein in fasta format with protein2genome and est2genome or only protein2genome doesn?t work >>>> >>>> The problem occurs when protein alignments try to be extended by the EST information. >>>> I tried using the same tool versions as you (BioPerl 1.007002, BLAST+ 2.7.1, Exonerate 2.2.0) but still the same problem. >>>> One of the interesting thing is that the problem does not occur when I used the protein in gff format. >>>> >>>> Here is one gff file raising the error. If you want to try in the same conditions you will have to use/dowmload the swissprot database (uniprot rewieved only). >>>> >>>> >>>> <9529.1.136329.CGATGT.gff3> >>>> >>>> >>>> I hope we will finish to find a solution to this problem? >>>> >>>> Best regards, >>>> >>>> Jacques >>>> ------------------------------------------------- >>>> Jacques Dainat, Ph.D. >>>> NBIS (National Bioinformatics Infrastructure Sweden) >>>> Genome Annotation Service >>>> http://nbis.se/about/staff/jacques-dainat/ >>>> http://nbis.se >>>> >>>> Address: >>>> Uppsala University, Biomedicinska Centrum >>>> Department of Medical Biochemistry Microbiology, Genomics >>>> Husargatan 3, box 582 >>>> S-75123 Uppsala Sweden >>>> Phone: +46 18 471 46 25 >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ar834 at cornell.edu Fri Apr 20 08:47:21 2018 From: ar834 at cornell.edu (Aditi Rambani) Date: Fri, 20 Apr 2018 14:47:21 +0000 Subject: [maker-devel] August failing for one contig only during Maker run In-Reply-To: <0bb33aae-a9a8-4573-a743-3591de9b4208@Spark> References: <15BF15D1-835F-4E8E-9B52-A958EF19BCD9@gmail.com>, <0bb33aae-a9a8-4573-a743-3591de9b4208@Spark> Message-ID: Hello, Maker successfully annotated the genome but keeps failing for one contig with error message attached below. Can you please help me troubleshoot this ? There is no issue with augustus for rest of the contigs. Thank you Aditi #--------- command -------------# Widget::augustus: ~/augustus-3.2/bin/augustus --species=Cxxx --strand=forward --UTR=off --hintsfile=/tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.xdef.augustus --extrinsicCfgFile=/data/home/srs57/programs/augustus-3.2/config/extrinsic/extrinsic.MPE.cfg /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus.fasta > /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus #-------------------------------# Sampling error in intron model. state=14 base=4422 ~/augustus-3.2/bin/augustus: ERROR Tried to sample from empty list. Sampling error in intron model. state=14 base=4422 ~/augustus-3.2/bin/augustus: ERROR Tried to sample from empty list. ERROR: Augustus failed --> rank=NA, hostname=xxx ERROR: Failed while annotating transcripts ERROR: Chunk failed at level:1, tier_type:4 FAILED CONTIG:Contigxxx ERROR: Chunk failed at level:6, tier_type:0 FAILED CONTIG:Contigxxx examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 24 13:21:40 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Apr 2018 13:21:40 -0600 Subject: [maker-devel] August failing for one contig only during Maker run In-Reply-To: References: <15BF15D1-835F-4E8E-9B52-A958EF19BCD9@gmail.com> <0bb33aae-a9a8-4573-a743-3591de9b4208@Spark> Message-ID: The failure is internal to Augustus. Try updating Augustus to 3.3. If it still fails after that let me know. We will have to isolate the command and files used, so you can send it as a test dataset to the Augustus devlopers. ?Carson > On Apr 20, 2018, at 8:47 AM, Aditi Rambani wrote: > > Hello, > > Maker successfully annotated the genome but keeps failing for one contig with error message attached below. Can you please help me troubleshoot this ? There is no issue with augustus for rest of the contigs. > > Thank you > > Aditi > > #--------- command -------------# > Widget::augustus: > ~/augustus-3.2/bin/augustus --species=Cxxx --strand=forward --UTR=off --hintsfile=/tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.xdef.augustus --extrinsicCfgFile=/data/home/srs57/programs/augustus-3.2/config/extrinsic/extrinsic.MPE.cfg /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus.fasta > /tmp/maker_dSn0ps/0/71_0.26253049-26257472.Cxxx.auto_annotator.augustus > #-------------------------------# > Sampling error in intron model. state=14 base=4422 > > ~/augustus-3.2/bin/augustus: ERROR > Tried to sample from empty list. > > Sampling error in intron model. state=14 base=4422 > ~/augustus-3.2/bin/augustus: ERROR > Tried to sample from empty list. > > ERROR: Augustus failed > --> rank=NA, hostname=xxx > ERROR: Failed while annotating transcripts > ERROR: Chunk failed at level:1, tier_type:4 > FAILED CONTIG:Contigxxx > > ERROR: Chunk failed at level:6, tier_type:0 > FAILED CONTIG:Contigxxx > > examining contents of the fasta file and run log -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiapeng.chen at sydney.edu.au Sat Apr 21 22:24:50 2018 From: jiapeng.chen at sydney.edu.au (Jiapeng Chen) Date: Sun, 22 Apr 2018 04:24:50 +0000 Subject: [maker-devel] Error in maker perl module Message-ID: Hi, Thank you so much for developing MAKER, which is such a huge project. I ran into an error message: Can't locate object method "first" via package "B::COP" at /usr/local/perl/5.20.1/lib/5.20.1/B/Deparse.pm line 3228. --> rank=NA, hostname=hpc182, at /usr/local/maker/2.31.8/maker/bin/../lib/Process/MpiChunk.pm line 4509. What would be the simplest solution for this bug? Cheers, Jiapeng From carsonhh at gmail.com Tue Apr 24 13:28:56 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 Apr 2018 13:28:56 -0600 Subject: [maker-devel] Error in maker perl module In-Reply-To: References: Message-ID: The failure is from perl?s 'Storable module'. You may need to reinstall it as well as potentially broken dependencies (B::COP and B::Deparse). Do that via CPAN. You appear to be using a non-system perl (here ?> /usr/local/perl/5.20.1). That install may be broken, or you may have a mix of older system perl modules together with modules from your newer non-system installation. ?Carson > On Apr 21, 2018, at 10:24 PM, Jiapeng Chen wrote: > > Hi, > > Thank you so much for developing MAKER, which is such a huge project. > > I ran into an error message: > Can't locate object method "first" via package "B::COP" at /usr/local/perl/5.20.1/lib/5.20.1/B/Deparse.pm line 3228. > --> rank=NA, hostname=hpc182, at /usr/local/maker/2.31.8/maker/bin/../lib/Process/MpiChunk.pm line 4509. > What would be the simplest solution for this bug? > > Cheers, > Jiapeng > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From qlian003 at ucr.edu Tue Apr 3 12:49:28 2018 From: qlian003 at ucr.edu (Qihua Liang) Date: Tue, 3 Apr 2018 11:49:28 -0700 Subject: [maker-devel] exon names in gff file Message-ID: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> Dear Maker development team, I find in the gff file of exon annotation, it looks like: ctg631 maker exon 16239 16243 . - . ID=pInfestans_00016306-RA:exon:96;Parent=pInfestans_00016306-RA; I am wondering what does ?96? mean in ID=pInfestans_00016306-RA:exon:96, it does not look like the exon numbering because not all transcripts have an exon starts from 0. Thank you Qihua From carsonhh at gmail.com Tue Apr 3 13:24:56 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 3 Apr 2018 13:24:56 -0600 Subject: [maker-devel] exon names in gff file In-Reply-To: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> References: <9F179C86-11B2-469D-B13E-E858700ACD3C@ucr.edu> Message-ID: <9A0C4A3C-DE99-4599-B867-A4A7854EA5AB@gmail.com> It?s just an iterator to ensure the ID attribute is unique for proper parent/child feature reconstruction. It?s required on the computation side and is meaningless biologically. ?Carson > On Apr 3, 2018, at 12:49 PM, Qihua Liang wrote: > > Dear Maker development team, > > I find in the gff file of exon annotation, it looks like: > ctg631 maker exon 16239 16243 . - . ID=pInfestans_00016306-RA:exon:96;Parent=pInfestans_00016306-RA; > > I am wondering what does ?96? mean in ID=pInfestans_00016306-RA:exon:96, it does not look like the exon numbering because not all transcripts have an exon starts from 0. > > Thank you > Qihua > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Apr 6 09:40:14 2018 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Apr 2018 09:40:14 -0600 Subject: [maker-devel] SNAP bootstrap training Message-ID: More than 2 total trading rounds can generate what is known as the overtraining trap. So I rarely do more than one round of bootstrapping with SNAP. To evaluate the models, look at them in a browser. If the raw models are similar to the final hint based models, then SNAP is well trained. If not then SNAP is poorly trained. Don?t use final models directly to evaluate training. Rather look at the raw models. They are what are made direct from the HMM. A well trained predictor will perform similarly even outside if MAKER. If it?s over predicting on its own, you may need to filter or even manually curate a subset of models from the initial training round to get better bootstrap training. Also if you did not build a species specific repeat library, you may be under masking and essentially training SNAP to find transposons with the bootstrapping. ?Carson Sent from my iPhone > On Apr 6, 2018, at 7:23 AM, Timo Metz wrote: > > Hello, > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > thanks in advance > > kind regards > Timo Metz From timo.metz at googlemail.com Fri Apr 6 07:23:29 2018 From: timo.metz at googlemail.com (Timo Metz) Date: Fri, 6 Apr 2018 15:23:29 +0200 Subject: [maker-devel] SNAP bootstrap training Message-ID: Hello, I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? thanks in advance kind regards Timo Metz -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngrundma at uni-muenster.de Sat Apr 7 04:38:46 2018 From: ngrundma at uni-muenster.de (Norbert Grundmann) Date: Sat, 7 Apr 2018 12:38:46 +0200 Subject: [maker-devel] problems running maker 2.31.9 Message-ID: Hello, I succesfully installed maker version 2.31.9 on my FreeBSD 10.3 Server.? So far only minor things had to be done.? But - what does following mean? # maker *Thread server rejected connection: 192.168.1.3:29786 does not match allowed IP mask* The thing is that the maker process is running in a "container" (jail) with the mentioned ip adress - which is "natted" to the outside.? is there any chance to run it? Thank you, Norbert Grundmann -- Norbert Grundmann Inst. of Bioinformatics Muenster Niels Stensen Strasse 14 48149 Muenster / Germany Tel. 0251 - 83 53 007 (Use *BSD, because Linux is a patch for Linux) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dandence at gmail.com Tue Apr 10 09:15:51 2018 From: dandence at gmail.com (Daniel Ence) Date: Tue, 10 Apr 2018 11:15:51 -0400 Subject: [maker-devel] SNAP bootstrap training In-Reply-To: References: Message-ID: Hi, what evidence are you using to get AEDs for the results of your bootstrap training? I don?t find it surprising that the AEDs get worse in subsequent rounds of bootstrap training since overtraining is a real possibility when training ab initio predictors. 300 genes also might not be enough genes, since I think the tutorials and protocols here and here use 1000 genes for training SNAP. I do find it surprising that training file from a different organism gives models that match evidence from your organism of interest. Is that correct? ~Daniel > On Apr 6, 2018, at 9:23 AM, Timo Metz wrote: > > Hello, > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > thanks in advance > > kind regards > Timo Metz > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1356 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 10 09:23:02 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2018 09:23:02 -0600 Subject: [maker-devel] SNAP bootstrap training In-Reply-To: References: Message-ID: If there is something in the assembly (broken ORF, altered splice site, or small string of N?s - very common in new assemblies) the gene predictor will alter splicing and intron/exon patterns to get around it. The issue is almost always in the assembly. Also if you are not masking repeats (i.e. did not build a species specific library), it will introduce ORFs from transposons that will confuse gene predictors. Finally some predictors don?t work well on some organisms. SNAP has trouble with many vertebrate species for example. A higher quality dataset of ~300 is good enough for training. If you have more (500-1000), most protocols have you split the dataset into a training set and a test set to evaluate sensitivity/specificity using tools like Eval from WashU (i.e. you train on half then predict on the other half to see if the predictions match the models). ?Carson > On Apr 9, 2018, at 5:55 AM, Timo Metz wrote: > > Hey Carson, > > thanks for your advice. Would you then rather go for a little set of genes with high quality or rather more genes to feed into MAKER for the training? > > And I have another question, which is rather not directly related to this topic but I hope that you might still answer: It seems sometimes as if the hint-based prediction does not work sufficient. I can clearly find examples where maker infers gene models directly from a prediction even though the evidence does totally indicate something different and the gene model is probably wrong then (as I even find that cases when looking at highly conserved regions where I actually now the structure the gene should have). > > best > Timo > > > 2018-04-06 17:40 GMT+02:00 Carson Holt >: > More than 2 total trading rounds can generate what is known as the overtraining trap. So I rarely do more than one round of bootstrapping with SNAP. To evaluate the models, look at them in a browser. If the raw models are similar to the final hint based models, then SNAP is well trained. If not then SNAP is poorly trained. Don?t use final models directly to evaluate training. Rather look at the raw models. They are what are made direct from the HMM. A well trained predictor will perform similarly even outside if MAKER. If it?s over predicting on its own, you may need to filter or even manually curate a subset of models from the initial training round to get better bootstrap training. Also if you did not build a species specific repeat library, you may be under masking and essentially training SNAP to find transposons with the bootstrapping. > > ?Carson > > Sent from my iPhone > > > On Apr 6, 2018, at 7:23 AM, Timo Metz > wrote: > > > > Hello, > > > > I am using MAKER for a non-model organism, and I am currently trying to do the bootstrap training for SNAP as outlined in the tutorial and the paper for MAKER. > > > > For the training I am using a set of ~300 sequences which are conserved (no golden standard genes available) and have very high quality and stop training after third round of bootstrap training. > > > > However, it seems as training does not work properly, because when checking the AEDs for each round of bootstrap training, they actually get worse each round. Also the performance of snap after training is practically similar as before training and significantly worse than using a training file for a model organism. > > > > Are there any suggestions what could be wrong? Is there anything special to check or look at what is not mentioned in the tutorial? > > > > thanks in advance > > > > kind regards > > Timo Metz > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 10:38:59 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2018 10:38:59 -0600 Subject: [maker-devel] problems running maker 2.31.9 In-Reply-To: References: Message-ID: If you are running with MPI, you may need to test different MPI configurations and settings. For example if it is running on a single machine (not cross machine MPI) you can manually specify the host as localhost. ?Carson > On Apr 7, 2018, at 4:38 AM, Norbert Grundmann wrote: > > Hello, > > I succesfully installed maker version 2.31.9 on my FreeBSD 10.3 Server. So far only minor things had to be done. But - what does following mean? > # maker > Thread server rejected connection: 192.168.1.3:29786 does not match allowed IP mask > The thing is that the maker process is running in a "container" (jail) with the mentioned ip adress - which is "natted" to the outside. is there any chance to run it? > > Thank you, Norbert Grundmann > > -- > Norbert Grundmann > Inst. of Bioinformatics Muenster > Niels Stensen Strasse 14 > 48149 Muenster / Germany > Tel. 0251 - 83 53 007 > (Use *BSD, because Linux is a patch for Linux) > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Apr 11 10:36:29 2018 From: carson.holt at genetics.utah.edu (Carson Hinton Holt) Date: Wed, 11 Apr 2018 16:36:29 +0000 Subject: [maker-devel] MAKER start with masked genome In-Reply-To: <7EEA2C96-C177-4B43-979D-F105DCDAA1CA@umail.utah.edu> References: <900B3D58-5EF1-4410-B666-B68E479F8BB8@uni-muenster.de> <7EEA2C96-C177-4B43-979D-F105DCDAA1CA@umail.utah.edu> Message-ID: <597DA35A-0D01-45CD-A992-FD7B95D85B54@genetics.utah.edu> There are two ways. First rerun in the same working directory. MAKER will reuse previous repeat masking files as long as repeat masking settings did not change between runs. Second, if you have a genome wide GFF3 from the previous run, you can pass it in as maker_gff and set the appropriate pass=1 option underneath for repeats. ?Carson Sent from my iPhone > On Apr 11, 2018, at 6:22 AM, Mark Yandell wrote: > > > > On 4/11/18, 5:08 AM, "Jonas Bohn" wrote: > > Dear MAKER developers, > > I`m a master student in a bioinformatics group of university of muenster and I want to use MAKER for genome Annotation of an Ant genome. I ran RepeatMasker before and it took some days to get a masked genome. So I try to save some time for my master thesis. My question is: Is there an option to run MAKER2 without running RepeatMasker again (skip the RepeatMasker step)? > > I`m looking forward to hearing from you. > > Best regards, > > Jonas Bohn > > MSc. Student > Evolutionary Bioinformatics > University of Muenster, Germany > From carsonhh at gmail.com Wed Apr 11 11:57:36 2018 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 11 Apr 2018 11:57:36 -0600 Subject: [maker-devel] Private message regarding: MAKER run error In-Reply-To: References:

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> Message-ID: <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> The issue is with Berkley DB. BioPerl is using perl?s DB_File module to index the fastas. 1. Make sure you do not have an extremely large number of reads in the fasta files (i.e. mRNA-seq data which cannot be used directly as input to MAKER, you must assemble it first into transcriptome contigs) 2. Reinstall perl and compile against the newly installed BerkleyDB libraries. 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File module. You can count reads in your fasta input using this command (replace file.fasta) grep -c ?>? file.fasta If your counts are really high (i.e. higher than a few hundred thousand maximum), then you have a data issue. You are either giving too much data or the wrong data as input. ?Carson > On Apr 11, 2018, at 11:39 AM, ohon Kin wrote: > > > hello ; Carson > > i really would appreciate your help im kind of having same issue > i get this Error when i run maker i assumed that it required big memory space > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > while working 1T of my Hard-disc capacity seems not enough for maker annotation > i think something wrong in my input data or the dependencies > would you please advice on the matter and elaborate solutions please > > i have install BerkleyDB using brew > > The input giving to Maker as followed : > Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> then added it directly to maker for annotation > > do i have to apply these data pre-process before it applied to maker > > > > > > > > > On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: > The FASTA file gets indexed by BioPerl using BerkleyDB. > > I?m guessing there is something odd about your input file and the database has run out of HASHes for indexing. > > You can google if there is a setting you can configure in BerkleyDB on Mac. > > But I suspect you are doing something like giving the raw reads from an mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of entrires to be indexed), which would be incorrect. MAKER can?t handle raw data. You must first assemble it using using like Trinity for example for mRNA. > > Thanks, > Carson > >> On Dec 7, 2017, at 8:53 AM, Scott Cain scottcain.net > wrote: >> >> Hi Guinara, >> >> I don't know (though my guess would be that you're running out of memory). I'm cc'ing the MAKER developer's mailing list to see if anybody on that list knows. >> >> Scott >> >> >> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova ualberta.ca > wrote: >> Hello, >> >> I got this error running maker on mac: >> >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> HASH: Out of overflow pages. Increase page size >> Filesize limit exceeded: 25 >> >> Is there anything that could solve it? >> >> Thank you, >> Gulnara >> >> >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/ ) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> maker-devel mailing list >> maker...@ <>box290.bluehost. com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nellerk at yorku.ca Thu Apr 12 11:12:12 2018 From: nellerk at yorku.ca (nellerk at yorku.ca) Date: Thu, 12 Apr 2018 17:12:12 +0000 Subject: [maker-devel] evidence-only gene annotation Message-ID: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> Hello, I am using Maker to annotate a novel, non-model plant genome. Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. I have two questions:1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way.2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library. Thank you for any assistance! Kira -------------- next part -------------- An HTML attachment was scrubbed... URL: From aejysselansie at gmail.com Fri Apr 13 00:35:08 2018 From: aejysselansie at gmail.com (Ansie Yssel) Date: Fri, 13 Apr 2018 08:35:08 +0200 Subject: [maker-devel] Maker, no fasta files in output In-Reply-To: References: Message-ID: Dear Carson I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? I am trying to annotate a newly sequenced genome. I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. The input that I used for generating the gene models (before training SNAP) was: the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. softmasking was set to 1 That output was used to train snap. Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. I trained snap for a second time. That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. Input for Basic Protocol 1 was: The snap hmm file my unmasked genome the species specific repeat library RNAseq evidence the protein evidence from a close relative softmasking was set to 1 est2genome=0 protein2genome=0 I collected the results as outlined on page 5 of the article. However I noticed that there were no Fasta files. Do you have any idea what could have gone wrong? Can I send my log files to you? Thanks in advance for any assistance. Kind Regards Anna Yssel On 12 April 2018 at 10:16, Ansie Yssel wrote: > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new > topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated > with REPET, the unmasked genome, and proteins from a "closley" related > species (actually not that close, but my species is the only one in its > genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article > "Genome annotation and Curation using MAKER and MAKER-P" published in Curr > Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training > SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and > protein2genome=1. I also used Repeat masking and included my species > specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using > the hmm file as input (and setting est2genome=0 and protein2genome=0). > Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the > aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > > > Virus-free. > www.avast.com > > <#m_4251738625593252965_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- Kind Regards A Yssel Centre of Microbial and Plant Genetics KU Leuven Faculteit Bio-ingenieurswetenschappen Kasteelpark Arenberg 20, bus 2460 B-3001 Heverlee Belgium -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 16 12:21:22 2018 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Apr 2018 12:21:22 -0600 Subject: [maker-devel] Maker, no fasta files in output In-Reply-To: References: Message-ID: <89921910-7CB1-4481-8973-F3E7DF4E688D@gmail.com> Hi Anna, The lack of results means you either had no results from SNAP or no evidence supporting results in your run. You can check for SNAP results just by looking for snap_masked features in the GFF3. For evidence, make sure you still provided the protein= and est= files even though you tunred off est2genome/protein2genome. ?Carson > On Apr 13, 2018, at 12:35 AM, Ansie Yssel wrote: > > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > On 12 April 2018 at 10:16, Ansie Yssel > wrote: > Dear Carson > > I am subscribed to the Maker list, but for some reason I cannot post a new topic when I view the Forum page. I hope it is OK if I email you directly? > > I am trying to annotate a newly sequenced genome. > I have RNAseq data, a species specific repeat library that was generated with REPET, the unmasked genome, and proteins from a "closley" related species (actually not that close, but my species is the only one in its genus, so I took proteins from another genus in the same family). > > I started by following Support Protocol 1, on page 10 of the article "Genome annotation and Curation using MAKER and MAKER-P" published in Curr Protoc Bioinformatics 48. > > The input that I used for generating the gene models (before training SNAP) was: > the est data, my genome and the protein data. I also set est2genome=1 and protein2genome=1. I also used Repeat masking and included my species specific repeat library. > softmasking was set to 1 > That output was used to train snap. > > Then I ran maker in "Gene prediction mode" as outlined on page 11 using the hmm file as input (and setting est2genome=0 and protein2genome=0). Repeat masking was enabled, again using my species specific library. > > I trained snap for a second time. > That output was used as input for Basic Protocol 1 on page 3 of the aforementioned article. > Input for Basic Protocol 1 was: > The snap hmm file > my unmasked genome > the species specific repeat library > RNAseq evidence > the protein evidence from a close relative > softmasking was set to 1 > est2genome=0 > protein2genome=0 > > I collected the results as outlined on page 5 of the article. > However I noticed that there were no Fasta files. > Do you have any idea what could have gone wrong? > Can I send my log files to you? Thanks in advance for any assistance. > > Kind Regards > Anna Yssel > > > Virus-free. www.avast.com

> > > -- > Kind Regards > A Yssel > > Centre of Microbial and Plant Genetics > KU Leuven > Faculteit Bio-ingenieurswetenschappen > Kasteelpark Arenberg 20, bus 2460 > B-3001 Heverlee > Belgium > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 16 12:26:06 2018 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Apr 2018 12:26:06 -0600 Subject: [maker-devel] evidence-only gene annotation In-Reply-To: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> References: <4e2545f.d2e0683b515ef71cd3e05d9a5e9e83a4@mymail.yorku.ca> Message-ID: Fusions are generated by the evidence alignments. Either transcript assemblies wee falsely fused or proteins are bridging neighboring paralogs. For transcript data you can try building the assembly with Trinity and the jaccard_index option which will reduce the occurrence of transcript assembly fusion. Also set correct_est_fusion=1 in the options files. For protein evidence driven fusions, you can try DeFusion which is a post process you run on the MAKER output that will search and attempt top correct for paralog driven fusions. ?Carson > On Apr 12, 2018, at 11:12 AM, nellerk at yorku.ca wrote: > > Hello, > > I am using Maker to annotate a novel, non-model plant genome. > > Following the published protocol, I have run one evidence-only round (est2genome, prot2genome = 1) followed by two iterative rounds, re-training Snap and Augustus each time. > > I have a curious result in that the gene predictors do not seem to be finding many genes, but instead creating gene fusions. As such, my evidence-only round resulted in 29,773 genes (mean length=5071 bp), and my final round yielded 29,845 genes (mean length=6530 bp). If I am interpreting this correctly, the predictors found only 72 new genes but greatly increased the mean length of all genes. I have inspected the results visually in a genome viewer and it seems that the predictors often create fusions with nearby pseudogenes. I attempted to reduce this by changing pred_flank from 200 (default) to 100, but it didn't seem to make a difference (at least for the genes I was looking at). > > So although my final Maker round looks good (~30,000 genes, 95% of genes have AED < 0.5), I have greater confidence in the models created by the evidence-only round. > > I have two questions: > 1) In this case, would it be acceptable to use evidence-only gene models (from Round 1), rather than those from Round 3 (which incorporated trained gene predictors)? I ask because I haven't seen reports of Maker being used in this way. > 2) Do you have any suggestions to improve my ab initio training or prediction? Please note, I have already repeat-masked the genome with a species-specific repeat library. > > Thank you for any assistance! > > Kira > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 09:58:16 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2018 09:58:16 -0600 Subject: [maker-devel] substr outside of string in PhatHits_utils.pm In-Reply-To: References: <5E5CA836-91B1-4AA8-8DC3-68FB9885EB43@gmail.com> <182CDDD3-A108-4095-9AC4-A2C198D34107@ibv.uio.no> <381F5EAB-2C0B-4DED-BD32-573D1B1C2B47@bils.se>

<7DAB19A4-573C-4E9E-A208-7352228A502B@gmail.com> <6BB1016E-378F-48A3-B535-3286767216A8@gmail.com> Message-ID: grep -c ">" Ca_kacst.fna 32572 the EST i have are assembled to contigs grep -c ">" Ca_EST 23602 grep -c ">" Ca__protein.faa 26729 these are my input-data i have reinstall perl as your instructions please have a look, the tool still 1T not enough will stop while running of the run i get this Error ad$ ./maker STATUS: Parsing control files... WARNING: 'max_dna_len' is set too low. The minimum value permited is 50,000. max_dna_len will be reset to 50,000 STATUS: Processing and indexing input FASTA files... HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size HASH: Out of overflow pages. Increase page size Filesize limit exceeded: 25 *my maker_opt* #-----Genome (these are always required) genome=/Users/mohanad/Documents/maker/data/Ca_dromedarius_kacst.fna #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff= #MAKER derived GFF3 file est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=/Users/mohanad/Documents/maker/data/Ca_dromedarius_EST #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=/Users/mohanad/Documents/maker/data/Ca_dromedarius_V1.0_protein.faa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=all #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm= #GeneMark HMM file augustus_species= #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=1#infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=10000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=0 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files On 11 April 2018 at 20:57, Carson Holt wrote: > The issue is with Berkley DB. BioPerl is using perl?s DB_File module to > index the fastas. > > 1. Make sure you do not have an extremely large number of reads in the > fasta files (i.e. mRNA-seq data which cannot be used directly as input to > MAKER, you must assemble it first into transcriptome contigs) > 2. Reinstall perl and compile against the newly installed BerkleyDB > libraries. > 3. Remove the brew installed BerkleyDB and use perl?s precompiled DB_File > module. > > You can count reads in your fasta input using this command (replace > file.fasta) > > grep -c ?>? file.fasta > > If your counts are really high (i.e. higher than a few hundred thousand > maximum), then you have a data issue. You are either giving too much data > or the wrong data as input. > > ?Carson > > > > On Apr 11, 2018, at 11:39 AM, ohon Kin wrote: > > > hello ; Carson > > i really would appreciate your help im kind of having same issue > i get this Error when i run maker i assumed that it required big memory > space > > STATUS: Processing and indexing input FASTA files... > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > HASH: Out of overflow pages. Increase page size > Filesize limit exceeded: 25 > > while working 1T of my Hard-disc capacity seems not enough for maker > annotation > i think something wrong in my input data or the dependencies > would you please advice on the matter and elaborate solutions please > > i have install BerkleyDB using brew > > The input giving to Maker as followed : > Genome , EST , Protein. all in Fasta format, downloaded from NCBI ---> > then added it directly to maker for annotation > > do i have to apply these data pre-process before it applied to maker > > > > > > > > > On Thursday, 7 December 2017 19:00:52 UTC+3, Carson Holt wrote: >> >> The FASTA file gets indexed by BioPerl using BerkleyDB. >> > > >> I?m guessing there is something odd about your input file and the >> database has run out of HASHes for indexing. >> > > >> You can google if there is a setting you can configure in BerkleyDB on >> Mac. >> > > >> But I suspect you are doing something like giving the raw reads from an >> mRNA-seq experiment or DNA sequencing to MAKER (resulting in billions of >> entrires to be indexed), which would be incorrect. MAKER can?t handle raw >> data. You must first assemble it using using like Trinity for example for >> mRNA. >> >> Thanks, >> Carson >> >> On Dec 7, 2017, at 8:53 AM, Scott Cain wrote: >> >> Hi Guinara, >> >> I don't know (though my guess would be that you're running out of >> memory). I'm cc'ing the MAKER developer's mailing list to see if anybody >> on that list knows. >> >> Scott >> >> >> On Wed, Dec 6, 2017 at 8:36 PM, Gulnara Tagirdzhanova > a.ca> wrote: >> >>> Hello, >>> >>> I got this error running maker on mac: >>> >>> STATUS: Parsing control files... >>> STATUS: Processing and indexing input FASTA files... >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> HASH: Out of overflow pages. Increase page size >>> Filesize limit exceeded: 25 >>> >>> Is there anything that could solve it? >>> >>> Thank you, >>> Gulnara >>> >>> >>> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> _______________________________________________ >> maker-devel mailing list >> maker... at box290.bluehost. com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -- *Warning: *This message and its attachment, if any, are confidential and may contain information protected by law. If you are not the intended recipient, please contact the sender immediately and delete the message and its attachment, if any. You should not copy the message and its attachment, if any, or disclose its contents to any other person or use it for any purpose. Statements and opinions expressed in this e-mail and its attachment, if any, are those of the sender, and do not necessarily reflect those of kacst. accepts no liability for any damage caused by this email. *?????:* ??? ??????? ??? ????? ?? ?????? (?? ????) ???? ????? ???? ?? ????? ??? ??????? ????? ????? ???????. ??? ?? ??? ????? ?????? ???? ??????? ???? ???? ????? ??????? ???? ?????? ????? ???? ??????? ????????? (?? ????)? ??? ???? ?? ??? ?? ????? ??? ??????? ?? ???????? (?? ????) ?? ?? ??? ????? ?? ????? ?????????? ????? ?? ????????? ??? ???. ????? ??? ???? ??? ??????? ????????? (?? ????) ???? ?? ??? ??????? ???? ???????? ??? ????? ????? ?????????? ??? ????? ??????? ?? ??????? ?? ??????? ??????? ?? ?? ?? ?????? ??? ??????. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 10:12:32 2018 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2018 10:12:32 -0600 Subject: [maker-devel] Private message regarding: MAKER run error In-Reply-To: References: