From carsonhh at gmail.com Mon Dec 1 13:31:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 Dec 2014 12:31:46 -0700 Subject: [maker-devel] gff output In-Reply-To: <5476ED52.3060902@gmail.com> References: <5476ED52.3060902@gmail.com> Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com> If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification. You have to make a couple of search and replace operations to make it work. Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus. AS a result there will be no improvement made to the annotations beyond what augustus has already produced. ?Carson > On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard wrote: > > Hello, > > I have been using Maker to generate an annotation. > I especially set these options: > - est_gff with a list of transcripts.gff3 (Cufflinks output) > - model_org=all > - rmlib=allrepeats.lib > - repeat_protein=te_prot.fasta > - pred_gff= Augustus.gff3 (that I generated previously) > > I obtain a gff file for each of my contigs. > However, here are the three possibilities in the second column : > # est_gff:cufflinks > # repeatmasker > # repeatrunner > > I have no information about exons and introns. > And I am wondering if the Augustus.gff3 was used... > > On top of that, I forgot to set up pred_stats to 1. > If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ? > > Thank you, > > Muriel > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Alice.Dennis at eawag.ch Fri Dec 12 08:10:46 2014 From: Alice.Dennis at eawag.ch (Dennis, Alice) Date: Fri, 12 Dec 2014 14:10:46 +0000 Subject: [maker-devel] iterative Maker2 Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Hi all, I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2. I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. This is the evidence that I give is: - de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here) - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? Thank you, Alice -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 12 09:41:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Dec 2014 08:41:42 -0700 Subject: [maker-devel] iterative Maker2 In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com> The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too. But I have one concern. You are using a very sparse protein evidence dataset. The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors. Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with. For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database. Don?t use the proteins extracted by CEGMA and HaMSTr. CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs. You need proteins from related species to look for homology not found in the EST dataset. Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance. Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase. Then add those results to the rmlib= option in the maker control files. Thanks, Carson > On Dec 12, 2014, at 7:10 AM, Dennis, Alice wrote: > > Hi all, > > I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2. > > I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. > This is the evidence that I give is: > - de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here) > - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr > > For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. > > For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. > > As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. > > Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? > > Thank you, > Alice > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuanduonganh at gmail.com Sun Dec 14 06:55:35 2014 From: tuanduonganh at gmail.com (Tuan Duong Anh) Date: Sun, 14 Dec 2014 14:55:35 +0200 Subject: [maker-devel] Quality filter perl script Message-ID: Hi all, I successfully ran MAKER and now looking into rescuing rejected gene models using protein domain evidence. I have obtained the tsv file from interproscan and have also updated the GFF3 file with this result using ipr_update_gff. In the next step I will need quality_filter.pl script to generate the default and standard build, however this script is not included in my version of MAKER. Do you know where can I get this script? Thanks. Tuan -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 15 14:13:29 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 Dec 2014 13:13:29 -0700 Subject: [maker-devel] Quality filter perl script In-Reply-To: References: Message-ID: Hi Tuan, I've attached a copy of the quality filter script. I've removed the .pl extension because some email services will not accept them. Take care, Mike On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh wrote: > > Hi all, > > I successfully ran MAKER and now looking into rescuing rejected gene > models using protein domain evidence. I have obtained the tsv file from > interproscan and have also updated the GFF3 file with this result using > ipr_update_gff. In the next step I will need quality_filter.pl script to > generate the default and standard build, however this script is not > included in my version of MAKER. Do you know where can I get this script? > > Thanks. > Tuan > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: quality_filter Type: application/octet-stream Size: 4597 bytes Desc: not available URL: From cognitiveshrapnel at gmail.com Sat Dec 27 20:59:12 2014 From: cognitiveshrapnel at gmail.com (Justin Peyton) Date: Sat, 27 Dec 2014 21:59:12 -0500 Subject: [maker-devel] openmpi instantly chokes on maker Message-ID: I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [molybdenum:23241] *** Process received signal *** [molybdenum:23241] Signal: Segmentation fault (11) [molybdenum:23241] Signal code: Address not mapped (1) [molybdenum:23241] Failing at address: 0x50c [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] [molybdenum:23241] *** End of error message *** SIGTERM received SIGTERM received SIGTERM received SIGTERM received SIGTERM received [molybdenum:23252] *** Process received signal *** [molybdenum:23252] Signal: Segmentation fault (11) [molybdenum:23252] Signal code: Address not mapped (1) [molybdenum:23252] Failing at address: 0x50c [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] [molybdenum:23252] *** End of error message *** SIGTERM received -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. Justin Peyton The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From harini1981 at gmail.com Tue Dec 23 04:31:46 2014 From: harini1981 at gmail.com (Harini Vinod) Date: Tue, 23 Dec 2014 16:01:46 +0530 Subject: [maker-devel] regd aed score plot Message-ID: Dear Concern, I had used the following script from MAKER-DEVEL AED_cdf_generator.pl to obtain the plot I get the following error readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. AED scaffold11.gff, scaffold7.gff Use of uninitialized value $total in division (/) at AED_cdf_generator.pl line 43. Illegal division by zero at AED_cdf_generator.pl line 43. Can you kindly suggest what could have gone wrong??? regards Harini -- K.Harini PhD scholar Lab-25 NCBS,GKVK Bangalore 560065 harinik at ncbs.res.in +91 9535292110 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 29 13:33:49 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 29 Dec 2014 12:33:49 -0700 Subject: [maker-devel] regd aed score plot In-Reply-To: References: Message-ID: I think I fixed this in a recent svn commit. Try the attached version of the script and let me know if it works. Thanks, Mike On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod wrote: > > Dear Concern, > I had used the following script from MAKER-DEVEL > AED_cdf_generator.pl to obtain the plot > > I get the following error > readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. > AED scaffold11.gff, scaffold7.gff > Use of uninitialized value $total in division (/) at AED_cdf_generator.pl > line 43. > Illegal division by zero at AED_cdf_generator.pl line 43. > > Can you kindly suggest what could have gone wrong??? > regards > Harini > > -- > K.Harini > PhD scholar > Lab-25 > NCBS,GKVK > Bangalore > 560065 > harinik at ncbs.res.in > +91 9535292110 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl.gz Type: application/x-gzip Size: 1116 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Dec 29 22:00:56 2014 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 30 Dec 2014 15:00:56 +1100 Subject: [maker-devel] few basic questions Message-ID: Hi there, I'm a newbie dealing with genomes and I've been trying to start using Maker for the annotation. I understand the base concepts but I have doubts about the correct steps to follow. I've being through the 2014 video tutorial and searched for detailed steps and I still have some question, maybe a bit obvious tough... I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Regarding the steps to process would this be correct?: 1. run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) 2. with this first output, create the hmm file for SNAP based on the first output 3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) 4. repeat2 and 3 as necessary* *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 14:39:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:39:10 -0700 Subject: [maker-devel] few basic questions In-Reply-To: References: Message-ID: Hi Xabier, See below ?> > I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). > I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species. If you have the proteome, use that. Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames. > Regarding the steps to process would this be correct?: > run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) > with this first output, create the hmm file for SNAP based on the first output > Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) > repeat2 and 3 as necessary* If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count). Just do protein2genome. In general to rounds of training is the maximum you should do. At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own). > *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. Run it in the same folder. This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster). In the tutorial we ran separately just to be able to open old results and compare. > I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? You do with Augustus, but not GeneMark which does self training. > **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs. Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries. I?d recommend using Repeat Modeler. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 14:42:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:42:38 -0700 Subject: [maker-devel] openmpi instantly chokes on maker In-Reply-To: References: Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com> Hi Justin, You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line. The following is from the INSTALL file that should be included with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 27, 2014, at 7:59 PM, Justin Peyton wrote: > > I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [molybdenum:23241] *** Process received signal *** > [molybdenum:23241] Signal: Segmentation fault (11) > [molybdenum:23241] Signal code: Address not mapped (1) > [molybdenum:23241] Failing at address: 0x50c > [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] > [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] > [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] > [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] > [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] > [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] > [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] > [molybdenum:23241] *** End of error message *** > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > [molybdenum:23252] *** Process received signal *** > [molybdenum:23252] Signal: Segmentation fault (11) > [molybdenum:23252] Signal code: Address not mapped (1) > [molybdenum:23252] Failing at address: 0x50c > [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] > [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] > [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] > [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] > [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] > [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] > [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] > [molybdenum:23252] *** End of error message *** > SIGTERM received > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). > > > I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. > > Justin Peyton > The Ohio State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Dec 31 19:48:29 2014 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Thu, 1 Jan 2015 09:48:29 +0800 Subject: [maker-devel] some problems using MAKER Message-ID: Hi all, Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. I paste my maker_opts.ctl here: #-----Genome (these are always required) genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=chr12.gff #MAKER derived GFF3 file est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=Rice #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=rice #SNAP HMM file gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file augustus_species=arabidopsis #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) Could you help me? Thank you !!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Dec 1 12:31:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 Dec 2014 12:31:46 -0700 Subject: [maker-devel] gff output In-Reply-To: <5476ED52.3060902@gmail.com> References: <5476ED52.3060902@gmail.com> Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com> If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification. You have to make a couple of search and replace operations to make it work. Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus. AS a result there will be no improvement made to the annotations beyond what augustus has already produced. ?Carson > On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard wrote: > > Hello, > > I have been using Maker to generate an annotation. > I especially set these options: > - est_gff with a list of transcripts.gff3 (Cufflinks output) > - model_org=all > - rmlib=allrepeats.lib > - repeat_protein=te_prot.fasta > - pred_gff= Augustus.gff3 (that I generated previously) > > I obtain a gff file for each of my contigs. > However, here are the three possibilities in the second column : > # est_gff:cufflinks > # repeatmasker > # repeatrunner > > I have no information about exons and introns. > And I am wondering if the Augustus.gff3 was used... > > On top of that, I forgot to set up pred_stats to 1. > If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ? > > Thank you, > > Muriel > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Alice.Dennis at eawag.ch Fri Dec 12 07:10:46 2014 From: Alice.Dennis at eawag.ch (Dennis, Alice) Date: Fri, 12 Dec 2014 14:10:46 +0000 Subject: [maker-devel] iterative Maker2 Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Hi all, I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2. I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. This is the evidence that I give is: - de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here) - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? Thank you, Alice -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 12 08:41:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Dec 2014 08:41:42 -0700 Subject: [maker-devel] iterative Maker2 In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com> The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too. But I have one concern. You are using a very sparse protein evidence dataset. The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors. Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with. For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database. Don?t use the proteins extracted by CEGMA and HaMSTr. CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs. You need proteins from related species to look for homology not found in the EST dataset. Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance. Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase. Then add those results to the rmlib= option in the maker control files. Thanks, Carson > On Dec 12, 2014, at 7:10 AM, Dennis, Alice wrote: > > Hi all, > > I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2. > > I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. > This is the evidence that I give is: > - de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here) > - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr > > For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. > > For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. > > As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. > > Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? > > Thank you, > Alice > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuanduonganh at gmail.com Sun Dec 14 05:55:35 2014 From: tuanduonganh at gmail.com (Tuan Duong Anh) Date: Sun, 14 Dec 2014 14:55:35 +0200 Subject: [maker-devel] Quality filter perl script Message-ID: Hi all, I successfully ran MAKER and now looking into rescuing rejected gene models using protein domain evidence. I have obtained the tsv file from interproscan and have also updated the GFF3 file with this result using ipr_update_gff. In the next step I will need quality_filter.pl script to generate the default and standard build, however this script is not included in my version of MAKER. Do you know where can I get this script? Thanks. Tuan -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 15 13:13:29 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 Dec 2014 13:13:29 -0700 Subject: [maker-devel] Quality filter perl script In-Reply-To: References: Message-ID: Hi Tuan, I've attached a copy of the quality filter script. I've removed the .pl extension because some email services will not accept them. Take care, Mike On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh wrote: > > Hi all, > > I successfully ran MAKER and now looking into rescuing rejected gene > models using protein domain evidence. I have obtained the tsv file from > interproscan and have also updated the GFF3 file with this result using > ipr_update_gff. In the next step I will need quality_filter.pl script to > generate the default and standard build, however this script is not > included in my version of MAKER. Do you know where can I get this script? > > Thanks. > Tuan > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: quality_filter Type: application/octet-stream Size: 4597 bytes Desc: not available URL: From cognitiveshrapnel at gmail.com Sat Dec 27 19:59:12 2014 From: cognitiveshrapnel at gmail.com (Justin Peyton) Date: Sat, 27 Dec 2014 21:59:12 -0500 Subject: [maker-devel] openmpi instantly chokes on maker Message-ID: I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [molybdenum:23241] *** Process received signal *** [molybdenum:23241] Signal: Segmentation fault (11) [molybdenum:23241] Signal code: Address not mapped (1) [molybdenum:23241] Failing at address: 0x50c [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] [molybdenum:23241] *** End of error message *** SIGTERM received SIGTERM received SIGTERM received SIGTERM received SIGTERM received [molybdenum:23252] *** Process received signal *** [molybdenum:23252] Signal: Segmentation fault (11) [molybdenum:23252] Signal code: Address not mapped (1) [molybdenum:23252] Failing at address: 0x50c [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] [molybdenum:23252] *** End of error message *** SIGTERM received -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. Justin Peyton The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From harini1981 at gmail.com Tue Dec 23 03:31:46 2014 From: harini1981 at gmail.com (Harini Vinod) Date: Tue, 23 Dec 2014 16:01:46 +0530 Subject: [maker-devel] regd aed score plot Message-ID: Dear Concern, I had used the following script from MAKER-DEVEL AED_cdf_generator.pl to obtain the plot I get the following error readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. AED scaffold11.gff, scaffold7.gff Use of uninitialized value $total in division (/) at AED_cdf_generator.pl line 43. Illegal division by zero at AED_cdf_generator.pl line 43. Can you kindly suggest what could have gone wrong??? regards Harini -- K.Harini PhD scholar Lab-25 NCBS,GKVK Bangalore 560065 harinik at ncbs.res.in +91 9535292110 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 29 12:33:49 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 29 Dec 2014 12:33:49 -0700 Subject: [maker-devel] regd aed score plot In-Reply-To: References: Message-ID: I think I fixed this in a recent svn commit. Try the attached version of the script and let me know if it works. Thanks, Mike On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod wrote: > > Dear Concern, > I had used the following script from MAKER-DEVEL > AED_cdf_generator.pl to obtain the plot > > I get the following error > readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. > AED scaffold11.gff, scaffold7.gff > Use of uninitialized value $total in division (/) at AED_cdf_generator.pl > line 43. > Illegal division by zero at AED_cdf_generator.pl line 43. > > Can you kindly suggest what could have gone wrong??? > regards > Harini > > -- > K.Harini > PhD scholar > Lab-25 > NCBS,GKVK > Bangalore > 560065 > harinik at ncbs.res.in > +91 9535292110 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl.gz Type: application/x-gzip Size: 1116 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Dec 29 21:00:56 2014 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 30 Dec 2014 15:00:56 +1100 Subject: [maker-devel] few basic questions Message-ID: Hi there, I'm a newbie dealing with genomes and I've been trying to start using Maker for the annotation. I understand the base concepts but I have doubts about the correct steps to follow. I've being through the 2014 video tutorial and searched for detailed steps and I still have some question, maybe a bit obvious tough... I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Regarding the steps to process would this be correct?: 1. run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) 2. with this first output, create the hmm file for SNAP based on the first output 3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) 4. repeat2 and 3 as necessary* *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 13:39:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:39:10 -0700 Subject: [maker-devel] few basic questions In-Reply-To: References: Message-ID: Hi Xabier, See below ?> > I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). > I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species. If you have the proteome, use that. Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames. > Regarding the steps to process would this be correct?: > run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) > with this first output, create the hmm file for SNAP based on the first output > Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) > repeat2 and 3 as necessary* If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count). Just do protein2genome. In general to rounds of training is the maximum you should do. At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own). > *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. Run it in the same folder. This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster). In the tutorial we ran separately just to be able to open old results and compare. > I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? You do with Augustus, but not GeneMark which does self training. > **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs. Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries. I?d recommend using Repeat Modeler. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 13:42:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:42:38 -0700 Subject: [maker-devel] openmpi instantly chokes on maker In-Reply-To: References: Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com> Hi Justin, You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line. The following is from the INSTALL file that should be included with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 27, 2014, at 7:59 PM, Justin Peyton wrote: > > I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [molybdenum:23241] *** Process received signal *** > [molybdenum:23241] Signal: Segmentation fault (11) > [molybdenum:23241] Signal code: Address not mapped (1) > [molybdenum:23241] Failing at address: 0x50c > [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] > [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] > [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] > [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] > [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] > [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] > [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] > [molybdenum:23241] *** End of error message *** > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > [molybdenum:23252] *** Process received signal *** > [molybdenum:23252] Signal: Segmentation fault (11) > [molybdenum:23252] Signal code: Address not mapped (1) > [molybdenum:23252] Failing at address: 0x50c > [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] > [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] > [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] > [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] > [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] > [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] > [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] > [molybdenum:23252] *** End of error message *** > SIGTERM received > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). > > > I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. > > Justin Peyton > The Ohio State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Dec 31 18:48:29 2014 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Thu, 1 Jan 2015 09:48:29 +0800 Subject: [maker-devel] some problems using MAKER Message-ID: Hi all, Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. I paste my maker_opts.ctl here: #-----Genome (these are always required) genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=chr12.gff #MAKER derived GFF3 file est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=Rice #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=rice #SNAP HMM file gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file augustus_species=arabidopsis #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) Could you help me? Thank you !!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Dec 1 12:31:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 Dec 2014 12:31:46 -0700 Subject: [maker-devel] gff output In-Reply-To: <5476ED52.3060902@gmail.com> References: <5476ED52.3060902@gmail.com> Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com> If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification. You have to make a couple of search and replace operations to make it work. Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus. AS a result there will be no improvement made to the annotations beyond what augustus has already produced. ?Carson > On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard wrote: > > Hello, > > I have been using Maker to generate an annotation. > I especially set these options: > - est_gff with a list of transcripts.gff3 (Cufflinks output) > - model_org=all > - rmlib=allrepeats.lib > - repeat_protein=te_prot.fasta > - pred_gff= Augustus.gff3 (that I generated previously) > > I obtain a gff file for each of my contigs. > However, here are the three possibilities in the second column : > # est_gff:cufflinks > # repeatmasker > # repeatrunner > > I have no information about exons and introns. > And I am wondering if the Augustus.gff3 was used... > > On top of that, I forgot to set up pred_stats to 1. > If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ? > > Thank you, > > Muriel > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Alice.Dennis at eawag.ch Fri Dec 12 07:10:46 2014 From: Alice.Dennis at eawag.ch (Dennis, Alice) Date: Fri, 12 Dec 2014 14:10:46 +0000 Subject: [maker-devel] iterative Maker2 Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Hi all, I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2. I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. This is the evidence that I give is: - de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here) - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? Thank you, Alice -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 12 08:41:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Dec 2014 08:41:42 -0700 Subject: [maker-devel] iterative Maker2 In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com> The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too. But I have one concern. You are using a very sparse protein evidence dataset. The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors. Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with. For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database. Don?t use the proteins extracted by CEGMA and HaMSTr. CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs. You need proteins from related species to look for homology not found in the EST dataset. Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance. Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase. Then add those results to the rmlib= option in the maker control files. Thanks, Carson > On Dec 12, 2014, at 7:10 AM, Dennis, Alice wrote: > > Hi all, > > I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2. > > I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. > This is the evidence that I give is: > - de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here) > - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr > > For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. > > For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. > > As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. > > Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? > > Thank you, > Alice > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuanduonganh at gmail.com Sun Dec 14 05:55:35 2014 From: tuanduonganh at gmail.com (Tuan Duong Anh) Date: Sun, 14 Dec 2014 14:55:35 +0200 Subject: [maker-devel] Quality filter perl script Message-ID: Hi all, I successfully ran MAKER and now looking into rescuing rejected gene models using protein domain evidence. I have obtained the tsv file from interproscan and have also updated the GFF3 file with this result using ipr_update_gff. In the next step I will need quality_filter.pl script to generate the default and standard build, however this script is not included in my version of MAKER. Do you know where can I get this script? Thanks. Tuan -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 15 13:13:29 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 Dec 2014 13:13:29 -0700 Subject: [maker-devel] Quality filter perl script In-Reply-To: References: Message-ID: Hi Tuan, I've attached a copy of the quality filter script. I've removed the .pl extension because some email services will not accept them. Take care, Mike On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh wrote: > > Hi all, > > I successfully ran MAKER and now looking into rescuing rejected gene > models using protein domain evidence. I have obtained the tsv file from > interproscan and have also updated the GFF3 file with this result using > ipr_update_gff. In the next step I will need quality_filter.pl script to > generate the default and standard build, however this script is not > included in my version of MAKER. Do you know where can I get this script? > > Thanks. > Tuan > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: quality_filter Type: application/octet-stream Size: 4598 bytes Desc: not available URL: From cognitiveshrapnel at gmail.com Sat Dec 27 19:59:12 2014 From: cognitiveshrapnel at gmail.com (Justin Peyton) Date: Sat, 27 Dec 2014 21:59:12 -0500 Subject: [maker-devel] openmpi instantly chokes on maker Message-ID: I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [molybdenum:23241] *** Process received signal *** [molybdenum:23241] Signal: Segmentation fault (11) [molybdenum:23241] Signal code: Address not mapped (1) [molybdenum:23241] Failing at address: 0x50c [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] [molybdenum:23241] *** End of error message *** SIGTERM received SIGTERM received SIGTERM received SIGTERM received SIGTERM received [molybdenum:23252] *** Process received signal *** [molybdenum:23252] Signal: Segmentation fault (11) [molybdenum:23252] Signal code: Address not mapped (1) [molybdenum:23252] Failing at address: 0x50c [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] [molybdenum:23252] *** End of error message *** SIGTERM received -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. Justin Peyton The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From harini1981 at gmail.com Tue Dec 23 03:31:46 2014 From: harini1981 at gmail.com (Harini Vinod) Date: Tue, 23 Dec 2014 16:01:46 +0530 Subject: [maker-devel] regd aed score plot Message-ID: Dear Concern, I had used the following script from MAKER-DEVEL AED_cdf_generator.pl to obtain the plot I get the following error readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. AED scaffold11.gff, scaffold7.gff Use of uninitialized value $total in division (/) at AED_cdf_generator.pl line 43. Illegal division by zero at AED_cdf_generator.pl line 43. Can you kindly suggest what could have gone wrong??? regards Harini -- K.Harini PhD scholar Lab-25 NCBS,GKVK Bangalore 560065 harinik at ncbs.res.in +91 9535292110 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 29 12:33:49 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 29 Dec 2014 12:33:49 -0700 Subject: [maker-devel] regd aed score plot In-Reply-To: References: Message-ID: I think I fixed this in a recent svn commit. Try the attached version of the script and let me know if it works. Thanks, Mike On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod wrote: > > Dear Concern, > I had used the following script from MAKER-DEVEL > AED_cdf_generator.pl to obtain the plot > > I get the following error > readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. > AED scaffold11.gff, scaffold7.gff > Use of uninitialized value $total in division (/) at AED_cdf_generator.pl > line 43. > Illegal division by zero at AED_cdf_generator.pl line 43. > > Can you kindly suggest what could have gone wrong??? > regards > Harini > > -- > K.Harini > PhD scholar > Lab-25 > NCBS,GKVK > Bangalore > 560065 > harinik at ncbs.res.in > +91 9535292110 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl.gz Type: application/x-gzip Size: 1116 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Dec 29 21:00:56 2014 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 30 Dec 2014 15:00:56 +1100 Subject: [maker-devel] few basic questions Message-ID: Hi there, I'm a newbie dealing with genomes and I've been trying to start using Maker for the annotation. I understand the base concepts but I have doubts about the correct steps to follow. I've being through the 2014 video tutorial and searched for detailed steps and I still have some question, maybe a bit obvious tough... I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Regarding the steps to process would this be correct?: 1. run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) 2. with this first output, create the hmm file for SNAP based on the first output 3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) 4. repeat2 and 3 as necessary* *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 13:39:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:39:10 -0700 Subject: [maker-devel] few basic questions In-Reply-To: References: Message-ID: Hi Xabier, See below ?> > I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). > I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species. If you have the proteome, use that. Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames. > Regarding the steps to process would this be correct?: > run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) > with this first output, create the hmm file for SNAP based on the first output > Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) > repeat2 and 3 as necessary* If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count). Just do protein2genome. In general to rounds of training is the maximum you should do. At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own). > *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. Run it in the same folder. This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster). In the tutorial we ran separately just to be able to open old results and compare. > I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? You do with Augustus, but not GeneMark which does self training. > **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs. Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries. I?d recommend using Repeat Modeler. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 13:42:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:42:38 -0700 Subject: [maker-devel] openmpi instantly chokes on maker In-Reply-To: References: Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com> Hi Justin, You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line. The following is from the INSTALL file that should be included with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 27, 2014, at 7:59 PM, Justin Peyton wrote: > > I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [molybdenum:23241] *** Process received signal *** > [molybdenum:23241] Signal: Segmentation fault (11) > [molybdenum:23241] Signal code: Address not mapped (1) > [molybdenum:23241] Failing at address: 0x50c > [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] > [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] > [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] > [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] > [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] > [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] > [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] > [molybdenum:23241] *** End of error message *** > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > [molybdenum:23252] *** Process received signal *** > [molybdenum:23252] Signal: Segmentation fault (11) > [molybdenum:23252] Signal code: Address not mapped (1) > [molybdenum:23252] Failing at address: 0x50c > [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] > [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] > [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] > [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] > [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] > [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] > [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] > [molybdenum:23252] *** End of error message *** > SIGTERM received > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). > > > I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. > > Justin Peyton > The Ohio State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Dec 31 18:48:29 2014 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Thu, 1 Jan 2015 09:48:29 +0800 Subject: [maker-devel] some problems using MAKER Message-ID: Hi all, Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. I paste my maker_opts.ctl here: #-----Genome (these are always required) genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=chr12.gff #MAKER derived GFF3 file est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=Rice #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=rice #SNAP HMM file gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file augustus_species=arabidopsis #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) Could you help me? Thank you !!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Dec 1 12:31:46 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 1 Dec 2014 12:31:46 -0700 Subject: [maker-devel] gff output In-Reply-To: <5476ED52.3060902@gmail.com> References: <5476ED52.3060902@gmail.com> Message-ID: <5A861A9A-5348-44B5-B0F6-C9AF3AA1469E@gmail.com> If you are using the gff3 directly produced by Augustus, it will be oddly structured and does not conform to the 'Canonical Gene? example given by the GFF3 format specification. You have to make a couple of search and replace operations to make it work. Also it would generally be better to let maker run augustus for you rather than providing it as GFF3. This is because you lose the hint feedback that maker provides augustus. AS a result there will be no improvement made to the annotations beyond what augustus has already produced. ?Carson > On Nov 27, 2014, at 2:22 AM, Muriel Gros-Balthazard wrote: > > Hello, > > I have been using Maker to generate an annotation. > I especially set these options: > - est_gff with a list of transcripts.gff3 (Cufflinks output) > - model_org=all > - rmlib=allrepeats.lib > - repeat_protein=te_prot.fasta > - pred_gff= Augustus.gff3 (that I generated previously) > > I obtain a gff file for each of my contigs. > However, here are the three possibilities in the second column : > # est_gff:cufflinks > # repeatmasker > # repeatrunner > > I have no information about exons and introns. > And I am wondering if the Augustus.gff3 was used... > > On top of that, I forgot to set up pred_stats to 1. > If I understand well, I can just change this in the ocntrol file, and run Maker again. Since there is the output with everything, it won't run again the prediction, only this option. Is that right ? > > Thank you, > > Muriel > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Alice.Dennis at eawag.ch Fri Dec 12 07:10:46 2014 From: Alice.Dennis at eawag.ch (Dennis, Alice) Date: Fri, 12 Dec 2014 14:10:46 +0000 Subject: [maker-devel] iterative Maker2 Message-ID: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Hi all, I am a relatively new user to Maker2, and I'm looking for advise on running many iterations of the same dataset in Maker2. I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. This is the evidence that I give is: - de novo assembled ESTs from a different strain of the same species (70,000 contigs... I am currently working on improving this assembly with the hope that this will be helpful here) - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting "exon" in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? Thank you, Alice -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Dec 12 08:41:42 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 12 Dec 2014 08:41:42 -0700 Subject: [maker-devel] iterative Maker2 In-Reply-To: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> References: <1FD5809847938F44B92893606806BD53600D845F@EE-MBX1.ee.emp-eaw.ch> Message-ID: <7D42E0F3-B601-4D67-AF07-09C98469D8E5@gmail.com> The gene models are actually produced by SNAP, Augustus, or whatever gene predictor you are using, so if you change the HMM every round, then the models will change too. But I have one concern. You are using a very sparse protein evidence dataset. The protein dataset is very important to MAKER?s performance, and for itterative training of the ab initio predictors. Normally after the second iteration, additional training should not be beneficial, but if you are getting wildly different results on 3rd and 4th round, then you probably aren?t getting sufficient good models to train with. For a protein dataset you should be using the entire a proteome from a minimum of two related species and perhaps all of UniProt/Swiss-prot to get a broad protein database. Don?t use the proteins extracted by CEGMA and HaMSTr. CEGMA can be used to guide the first HMM creation (cegma2zff scrip that comes with MAEKR), but don?t give the proteins to MAKER as evidence, also the HaMSTr results will be redundant with the ESTs. You need proteins from related species to look for homology not found in the EST dataset. Also repeat masking is important for any genome and has a huge effect on ab initio predictor performance. Make sure you run something like RepeatModeler to look for species specific repeats that will not already be in RepBase. Then add those results to the rmlib= option in the maker control files. Thanks, Carson > On Dec 12, 2014, at 7:10 AM, Dennis, Alice wrote: > > Hi all, > > I am a relatively new user to Maker2, and I?m looking for advise on running many iterations of the same dataset in Maker2. > > I have a relatively small genome (~124 MB) from a wasp that is assembled into ~1,500 scaffold. I have run several iterations of Maker2 by re-generating .hmms in SNAP and feeding them into the next round, and my gene predictions keep increasing (in number and in size). The only thing that changes at each round is the .hmm. > This is the evidence that I give is: > - de novo assembled ESTs from a different strain of the same species (70,000 contigs? I am currently working on improving this assembly with the hope that this will be helpful here) > - 610 proteins extracted from the genome scaffolds using CEGMA and HaMSTr > > For my 1st iteration, I used the Nasonia .hmm from SNAP, and the est2genome/protein2genome option. > > For the 2nd, 3rd and 4th rounds I have used .hmms generated from the previous round, all without the est2genome/protein2genome option. All other files are the same as in the original run. > > As I understand it, after the second round, nothing should change in Maker2. But the differences are obvious between runs. Some entirely new exons are annotated. For example, just counting ?exon? in the .gff file gives me 73,000 after the third iteration and 96,000 after the fourth! Actually the biggest leap in this number is between the third and fourth round. I can also see that many features are longer when I look at the files in Geneious. > > Is this sort of change possible after the second round of Maker2? Is there something I have done wrong in my runs, or am a understanding this output incorrectly? > > Thank you, > Alice > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuanduonganh at gmail.com Sun Dec 14 05:55:35 2014 From: tuanduonganh at gmail.com (Tuan Duong Anh) Date: Sun, 14 Dec 2014 14:55:35 +0200 Subject: [maker-devel] Quality filter perl script Message-ID: Hi all, I successfully ran MAKER and now looking into rescuing rejected gene models using protein domain evidence. I have obtained the tsv file from interproscan and have also updated the GFF3 file with this result using ipr_update_gff. In the next step I will need quality_filter.pl script to generate the default and standard build, however this script is not included in my version of MAKER. Do you know where can I get this script? Thanks. Tuan -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 15 13:13:29 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 15 Dec 2014 13:13:29 -0700 Subject: [maker-devel] Quality filter perl script In-Reply-To: References: Message-ID: Hi Tuan, I've attached a copy of the quality filter script. I've removed the .pl extension because some email services will not accept them. Take care, Mike On Sun, Dec 14, 2014 at 5:55 AM, Tuan Duong Anh wrote: > > Hi all, > > I successfully ran MAKER and now looking into rescuing rejected gene > models using protein domain evidence. I have obtained the tsv file from > interproscan and have also updated the GFF3 file with this result using > ipr_update_gff. In the next step I will need quality_filter.pl script to > generate the default and standard build, however this script is not > included in my version of MAKER. Do you know where can I get this script? > > Thanks. > Tuan > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: quality_filter Type: application/octet-stream Size: 4598 bytes Desc: not available URL: From cognitiveshrapnel at gmail.com Sat Dec 27 19:59:12 2014 From: cognitiveshrapnel at gmail.com (Justin Peyton) Date: Sat, 27 Dec 2014 21:59:12 -0500 Subject: [maker-devel] openmpi instantly chokes on maker Message-ID: I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... [molybdenum:23241] *** Process received signal *** [molybdenum:23241] Signal: Segmentation fault (11) [molybdenum:23241] Signal code: Address not mapped (1) [molybdenum:23241] Failing at address: 0x50c [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] [molybdenum:23241] *** End of error message *** SIGTERM received SIGTERM received SIGTERM received SIGTERM received SIGTERM received [molybdenum:23252] *** Process received signal *** [molybdenum:23252] Signal: Segmentation fault (11) [molybdenum:23252] Signal code: Address not mapped (1) [molybdenum:23252] Failing at address: 0x50c [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] [molybdenum:23252] *** End of error message *** SIGTERM received -------------------------------------------------------------------------- mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. Justin Peyton The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From harini1981 at gmail.com Tue Dec 23 03:31:46 2014 From: harini1981 at gmail.com (Harini Vinod) Date: Tue, 23 Dec 2014 16:01:46 +0530 Subject: [maker-devel] regd aed score plot Message-ID: Dear Concern, I had used the following script from MAKER-DEVEL AED_cdf_generator.pl to obtain the plot I get the following error readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. AED scaffold11.gff, scaffold7.gff Use of uninitialized value $total in division (/) at AED_cdf_generator.pl line 43. Illegal division by zero at AED_cdf_generator.pl line 43. Can you kindly suggest what could have gone wrong??? regards Harini -- K.Harini PhD scholar Lab-25 NCBS,GKVK Bangalore 560065 harinik at ncbs.res.in +91 9535292110 -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Mon Dec 29 12:33:49 2014 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Mon, 29 Dec 2014 12:33:49 -0700 Subject: [maker-devel] regd aed score plot In-Reply-To: References: Message-ID: I think I fixed this in a recent svn commit. Try the attached version of the script and let me know if it works. Thanks, Mike On Tue, Dec 23, 2014 at 3:31 AM, Harini Vinod wrote: > > Dear Concern, > I had used the following script from MAKER-DEVEL > AED_cdf_generator.pl to obtain the plot > > I get the following error > readline() on closed filehandle GEN0 at AED_cdf_generator.pl line 69. > AED scaffold11.gff, scaffold7.gff > Use of uninitialized value $total in division (/) at AED_cdf_generator.pl > line 43. > Illegal division by zero at AED_cdf_generator.pl line 43. > > Can you kindly suggest what could have gone wrong??? > regards > Harini > > -- > K.Harini > PhD scholar > Lab-25 > NCBS,GKVK > Bangalore > 560065 > harinik at ncbs.res.in > +91 9535292110 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Michael Campbell MS, RD. Doctoral Candidate Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AED_cdf_generator.pl.gz Type: application/x-gzip Size: 1116 bytes Desc: not available URL: From xvazquezc at gmail.com Mon Dec 29 21:00:56 2014 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 30 Dec 2014 15:00:56 +1100 Subject: [maker-devel] few basic questions Message-ID: Hi there, I'm a newbie dealing with genomes and I've been trying to start using Maker for the annotation. I understand the base concepts but I have doubts about the correct steps to follow. I've being through the 2014 video tutorial and searched for detailed steps and I still have some question, maybe a bit obvious tough... I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Regarding the steps to process would this be correct?: 1. run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) 2. with this first output, create the hmm file for SNAP based on the first output 3. Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) 4. repeat2 and 3 as necessary* *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file Thank you in advance, Xabier -- Xabier V?zquez Campos *PhD Candidate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 13:39:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:39:10 -0700 Subject: [maker-devel] few basic questions In-Reply-To: References: Message-ID: Hi Xabier, See below ?> > I have to annotate two fungal genomes and I only have the DNA assembly (no EST or protein files). > I understand that lacking of EST and protein files I should provide them as alt-est and protein from the closest species I can, but is it enough with one EST file from one organism for the alt-est? Provide alt-EST if you have ESTs from a closely relate species, but do not have the proteome for that species. If you have the proteome, use that. Both are aligned in amino acid space, and provide the same hint information, the only difference being that alt-EST takes 10x longer because because noth target and query must be translated into all 6 reading frames. > Regarding the steps to process would this be correct?: > run Maker with the genome, alt-est and protein files, with est2genome=1 and protein2genome=1 (softmask=1 ?) > with this first output, create the hmm file for SNAP based on the first output > Set est2genome=0 and protein2genome=0, set the snaphmm file and run again (using -base option) > repeat2 and 3 as necessary* If you don?t have ESTs, don?t do est2genome (alt-ESTs don?t count). Just do protein2genome. In general to rounds of training is the maximum you should do. At that point, ab initio predictions and hint based predictions will start to look like each other (so the ab initio models are doing well on their own). > *How do you know when you get to the point where no more refinement is possible? Would that the final model? It should be based on the AED scores? How can I get it without looking into individual sequence headings? Also, do you perform the bootstrapping on the same folder? In the tutorial I saw different folders, (e.g. pyu_contig1, pyu_contig2) used on each repetition, not sure if just for demonstration purposes or if it is the proper way to go.. Run it in the same folder. This will allow MAKER to recycle raw reports from BALST etc. from the previous run (i.e. MAKER will run faster). In the tutorial we ran separately just to be able to open old results and compare. > I'm trying to run also a gene prediction with Augustus and GeneMark. The first run will include an already trained profile for Augustus and the native hmm file of genemark-ES**. Do they need to repeat the prediction by bootstrap like with SNAP? If so, do I need to generate new hmm files or prediction models based on results? You do with Augustus, but not GeneMark which does self training. > **I have been trying to make the hmm file for genemark-ES using the gm_es.pl script but no matter what parameters I use the cluster shut the job down as it exceeds 128GB of memory in use. The genome I've been testing for this is about 42Mbp in a roughly 40-50 MB fasta file You can train GeneMark with just part of the genome. Try using 10Mb made up of the longest contigs. Also I only recommend using GeneMark on Fungi, it tends to not work well on organisms with more complex intron/exon structures. Also you should build a species specific repeat database to supplement RepeatMaskers internal libraries. I?d recommend using Repeat Modeler. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Dec 31 13:42:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 31 Dec 2014 13:42:38 -0700 Subject: [maker-devel] openmpi instantly chokes on maker In-Reply-To: References: Message-ID: <6BEE4837-A3E1-4FBF-AD18-4FBFD479BB2A@gmail.com> Hi Justin, You need to set LD_PRELOAD to the proper location and add the '-mca btl ^openib? flag to your command line. The following is from the INSTALL file that should be included with MAKER ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/location/of/openmpi/lib/libmpi.so). 1. Say yes to the 'configure for MPI' question when running 'perl Build.PL? in step 1 of the EASY INSTALL. 2. Give path to 'mpicc'. Note to make sure you do not give the path to ?mpicc' from another MPI flavor that might be installed on your system. 3. Give path to the folder containing 'mpi,h'. Note to make sure you do not give the path to a folder from another MPI flavor that might be installed on your system. Mixing MPI flavors for 'mpicc' and 'mpi.h' will cause failures. Make sure to read and confirm the auto-detected paths. 4. Finish installation according to steps 2-4 of the EASY INSTALL Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Thanks, Carson > On Dec 27, 2014, at 7:59 PM, Justin Peyton wrote: > > I am working on getting maker running on a system running ubuntu 14.04. I have installed maker and it runs great on a small but real data set. When I try it with openmpi with the exact same inputs, however, I get the below error almost instantly. > > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > [molybdenum:23241] *** Process received signal *** > [molybdenum:23241] Signal: Segmentation fault (11) > [molybdenum:23241] Signal code: Address not mapped (1) > [molybdenum:23241] Failing at address: 0x50c > [molybdenum:23241] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7f99bd5155a2] > [molybdenum:23241] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7f99bd0e8c30] > [molybdenum:23241] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7f99bd19fbad] > [molybdenum:23241] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7f99bcbcc156] > [molybdenum:23241] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7f99bcbc34bb] > [molybdenum:23241] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7f99bce6e97e] > [molybdenum:23241] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f99bc944182] > [molybdenum:23241] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd1acefd] > [molybdenum:23241] *** End of error message *** > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > SIGTERM received > [molybdenum:23252] *** Process received signal *** > [molybdenum:23252] Signal: Segmentation fault (11) > [molybdenum:23252] Signal code: Address not mapped (1) > [molybdenum:23252] Failing at address: 0x50c > [molybdenum:23252] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 1] /usr/lib/libperl.so.5.18(Perl_csighandler+0x22)[0x7fb191f5e5a2] > [molybdenum:23252] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36c30)[0x7fb191b31c30] > [molybdenum:23252] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__poll+0x2d)[0x7fb191be8bad] > [molybdenum:23252] [ 4] /usr/local/openmpi/lib/libopen-pal.so.6(+0x72156)[0x7fb191615156] > [molybdenum:23252] [ 5] /usr/local/openmpi/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x13b)[0x7fb19160c4bb] > [molybdenum:23252] [ 6] /usr/local/openmpi/lib/libopen-rte.so.7(+0x3897e)[0x7fb1918b797e] > [molybdenum:23252] [ 7] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fb19138d182] > [molybdenum:23252] [ 8] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb191bf5efd] > [molybdenum:23252] *** End of error message *** > SIGTERM received > -------------------------------------------------------------------------- > mpiexec noticed that process rank 2 with PID 23241 on node molybdenum exited on signal 11 (Segmentation fault). > > > I have tried reinstalling both maker and openmpi. I have tried two different versions of both maker and openmpi. I am curenlty working with maker 2.31.6 and openmpi 1.8.3 because I have had those work together on another system. I have triple checked that LD_PRELOAD is properly set. I have a feeling that I am pissing something small. I appreciate all the help. > > Justin Peyton > The Ohio State University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerryzhaosjtu at gmail.com Wed Dec 31 18:48:29 2014 From: jerryzhaosjtu at gmail.com (=?UTF-8?B?6LW16LaK?=) Date: Thu, 1 Jan 2015 09:48:29 +0800 Subject: [maker-devel] some problems using MAKER Message-ID: Hi all, Recently I'm using MAKER to annotate a single chromosome of rice as a pre-experiment. And I'm confronting some problems. After the annotation when I run the evaluation of eval between my result and gold standard, the gene sensitivity&specificity is only around 20%. And after I added the gff3 file maker made itself to run maker again, I found that the result is worse than 20%. My input is a Trinity-processed RNA-seq file and a protein file. I chose snap, augustus and genemark as ab initio predictors. I paste my maker_opts.ctl here: #-----Genome (these are always required) genome=chr12.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----Re-annotation Using MAKER Derived GFF3 maker_gff=chr12.gff #MAKER derived GFF3 file est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no #-----EST Evidence (for best results provide a file for at least one) est=rna-seq_trinity.fasta #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=Osativa_193_peptide.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=Rice #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=rice #SNAP HMM file gmhmm=/lustre/home/clswcc/yzhao/MAKER/maker/exe/genemark_hmm_euk_linux_64/ehmm/o_sativa.mod #GeneMark HMM file augustus_species=arabidopsis #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff=augus.gff3 #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no snoscan_rrna= #rRNA file to have Snoscan find snoRNAs unmask=1 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=16 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) Could you help me? Thank you !!! -- *Yue Zhao (Jerry)* Bachelor Candidate of Plant Biotechnology Researcher in UCLA-CSST program Shanghai Jiao Tong University, Shanghai *jerryzhaosjtu at gmail.com * -------------- next part -------------- An HTML attachment was scrubbed... URL: