From weckalba at asu.edu Tue Apr 3 18:28:28 2012 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 3 Apr 2012 16:28:28 -0700 Subject: [maker-devel] gff3_preds2models usage question Message-ID: Hello maker developers and users, I am attempting to use the gff3_preds2models scripts, but running into a few issues. Initially, I hit errors that seemed to be fixed by installing CGI and its dependancies. However, that during that installation a few tests did fail. I can provide error logs if that would be helpful, however, I went on to install and attempt gff3_preds2models anyway. What I am currently doing is running gff3_merge first, to gather the maker outputs. I am doing so with both the -n option on and off. When providing the gff3 file with the sequence I get the following error from gff3_preds2models: Undefined subroutine &maker::auto_annotator::annotate called at /Users/Walter/Bioinformatics/Tools/maker/bin/gff3_preds2models line 97, line 992291. This seemed to be the same error as that of what someone else saw on these boards, but I did not see a later email resolving the issue. I also tried giving it just the gff3 without the sequences at the bottom of the file and then I get this error: ERROR: There was a problem in the writing the fasta entry Either no sequence was given, or there was an error in writing This leads me to believe I should be using the one with the sequence, but I am not certain of that. I see it might be possible to go from maker outputs to chado database then to gene->mRNA->exon gff3s, but I have not set up my machine for XML or chado yet, and it does not appear trivial. Thanks for the help, Walter -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Apr 3 21:24:49 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Wed, 4 Apr 2012 02:24:49 +0000 Subject: [maker-devel] mRNA-seq data Message-ID: Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 3 21:39:02 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 03 Apr 2012 22:39:02 -0400 Subject: [maker-devel] mRNA-seq data In-Reply-To: Message-ID: Yes. If you have them in fasta format, just provide them to the est= option and let MAEKR align them with exonerate. If you used something like cufflinks or trinity, to process them you can provide them to the est_gff option (MAKER comes with a cufflinks2gff3 converter to make that easy). Thanks, Carson From: Sivaranjani Namasivayam Date: Wed, 4 Apr 2012 02:24:49 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] mRNA-seq data Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pingouinandsheep at gmail.com Thu Apr 5 09:14:39 2012 From: pingouinandsheep at gmail.com (pingouinandsheep at gmail.com) Date: Thu, 5 Apr 2012 07:14:39 -0700 (PDT) Subject: [maker-devel] Huge memory usage Message-ID: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Hello, When I try to run the test provided with maker2, maker start to use a huge amount of memory. I stoped it after it reach ~100go of memory used. I believe the test should not use that amount of memory. In an other message someone suggest that the bioperl version installed could be the cause of the problem, but the bioperl installed on my cluster is already at version 1.6. perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.006901 Unfortunately I don't have an error message to provide, that could clarify my problem. But maybe it is a recurrent problem and you know a few things I should check. Thanks, Ismael From carsonhh at gmail.com Thu Apr 5 09:26:17 2012 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Apr 2012 10:26:17 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Message-ID: The test should not use up more then a few megabytes of RAM. Even on very large datasets you should never really use more that 1 or 2 gig of RAM perl MAKER instance It's possible that their may be other perl modules that are broken need to be reinstalled on your system. This can happen when perl gets updated, but you are pointing to modules built for a different perl version with the PERL5LIB environmental variable. Make sure you you have the latest version of MAKER and run with --debug set. Collect that output and send it to me (the --debug option does some dependancy checking). I know there is an issue on Macs with updating perl's DB_File module that causes it to gobble up big sections of the hard drive (it will eventually fill the drive if you let it). It's not a memory issue but just one example of how broken modules can cause weird behavior. Thanks, Carson On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" wrote: >Hello, > >When I try to run the test provided with maker2, maker start to use a >huge amount of memory. I stoped it after it reach ~100go of memory >used. I believe the test should not use that amount of memory. > >In an other message someone suggest that the bioperl version installed >could be the cause of the problem, but the bioperl installed on my >cluster is already at version 1.6. > >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >1.006901 > >Unfortunately I don't have an error message to provide, that could >clarify my problem. > >But maybe it is a recurrent problem and you know a few things I should >check. > >Thanks, > >Ismael > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eernst at cshl.edu Sun Apr 8 17:09:22 2012 From: eernst at cshl.edu (Evan Ernst) Date: Sun, 8 Apr 2012 18:09:22 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Message-ID: Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 09:26:40 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2012 10:26:40 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI In-Reply-To: Message-ID: Depending on if your using NFS and other architecture design you can get race conditions with the datastore log file. This primarily happens when you have multiple instances of MAKER running at the same time or thousands of short contigs running in parallel so many finish at the same time. In a future release, I plan on having the last MAKER job to exit just rebuild the log at the end of a run to ensure it is complete. For now though, just run 'maker -dsindex' at the end of a run when it happens. It will rebbuild the log and only takes a few seconds. Thanks, Carson From: Evan Ernst Date: Sun, 8 Apr 2012 18:09:22 -0400 To: Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From smg283 at gmail.com Fri Apr 13 14:00:29 2012 From: smg283 at gmail.com (Scott Geib) Date: Fri, 13 Apr 2012 09:00:29 -1000 Subject: [maker-devel] mpi issue on computing cluster Message-ID: Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 15:12:24 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 20:12:24 +0000 Subject: [maker-devel] Functional annotation pipeline Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane From joseph.fass at gmail.com Fri Apr 13 15:42:51 2012 From: joseph.fass at gmail.com (Joseph Fass) Date: Fri, 13 Apr 2012 13:42:51 -0700 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Message-ID: Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 14:51:38 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 15:51:38 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: Message-ID: You can pre-mask the genome, convert the RepaetMasker results to GFF3 and pass them in, or just run the ./configure script in the RepeatMasker directory to configure wublast to be the default. You can also let MAKER install it's own separate installation of RepeatMasker using rmblast. Just go to the maker/src/ directory and run this command --> ./Build repeatmasker MAKER will use that installation preferentially if you let it install that. Thanks, Carson From: padioleau isma?l Date: Fri, 13 Apr 2012 17:42:20 +0200 To: Carson Holt Subject: Re: [maker-devel] Huge memory usage Dear Carson, I have a problem with RepeatMasker on my cluster. It work with wublast but not with Crossmatch. As maker try to run RepeatMasker with default I can not successfully run maker. I wanted to know if I can provide to maker the genome already masked (if I run with wublast externally), I though it was possible but I can't found in the configuration files where I should provide it i.e : In maker_opts.ctl, should I provide the result from RepeatMasker to 'genome_gff:' and set 'rm_pass' to 1, or set rm_gff in the 'Repeat Masking' part of the file? Or maybe I should provide directly the masked fasta as genome reference. An other solution could be to ask maker to run RepeatMasker with the option '-e wublast'. Is it possible to use one of these solutions? Thanks, Ismael 2012/4/5 padioleau isma?l > Dear Carson, > > Thank you for your very quick answering. > > I realised that I missed some error messages and the problem seems to be > linked to the DB_file package as you suggested. The person in charge of > installation told me that he will recover the configuration. > > I will test it after the Easter weekend and come back to you if we have other > issues. > > Have a nice Easter weekend, > > Ismael > > Here Is the error message: > Use of uninitialized value $DB_File::db_version in numeric ge (>=) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 276. > Use of uninitialized value $DB_File::db_version in numeric gt (>) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 280. > Deep recursion on subroutine "DB_File::AUTOLOAD" at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 235. > > > > 2012/4/5 Carson Holt >> The test should not use up more then a few megabytes of RAM. Even on very >> large datasets you should never really use more that 1 or 2 gig of RAM >> perl MAKER instance >> >> It's possible that their may be other perl modules that are broken need to >> be reinstalled on your system. This can happen when perl gets updated, >> but you are pointing to modules built for a different perl version with >> the PERL5LIB environmental variable. Make sure you you have the latest >> version of MAKER and run with --debug set. Collect that output and send >> it to me (the --debug option does some dependancy checking). >> >> I know there is an issue on Macs with updating perl's DB_File module that >> causes it to gobble up big sections of the hard drive (it will eventually >> fill the drive if you let it). It's not a memory issue but just one >> example of how broken modules can cause weird behavior. >> >> Thanks, >> Carson >> >> >> >> >> On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" >> wrote: >> >>> >Hello, >>> > >>> >When I try to run the test provided with maker2, maker start to use a >>> >huge amount of memory. I stoped it after it reach ~100go of memory >>> >used. I believe the test should not use that amount of memory. >>> > >>> >In an other message someone suggest that the bioperl version installed >>> >could be the cause of the problem, but the bioperl installed on my >>> >cluster is already at version 1.6. >>> > >>> >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >>> >1.006901 >>> > >>> >Unfortunately I don't have an error message to provide, that could >>> >clarify my problem. >>> > >>> >But maybe it is a recurrent problem and you know a few things I should >>> >check. >>> > >>> >Thanks, >>> > >>> >Ismael >>> > >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > > -- > Isma?l Padioleau > Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) > Emmanouil Dermitzakis Group > Dpt de M?decine G?n?tique et D?veloppement > Universit? de Gen?ve - Facult? de M?decine > CMU - Rue Michel-Servet 1 > CH 1211 Gen?ve 4 > Tel: 0041 22 379 59 74 > ismael.padioleau at unige.ch > > -- > Tel. 0041 78 77 69 561 > ismpadioleau at gmail.com -- Isma?l Padioleau Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) Emmanouil Dermitzakis Group Dpt de M?decine G?n?tique et D?veloppement Universit? de Gen?ve - Facult? de M?decine CMU - Rue Michel-Servet 1 CH 1211 Gen?ve 4 Tel: 0041 22 379 59 74 ismael.padioleau at unige.ch -- Tel. 0041 78 77 69 561 ismpadioleau at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 16:02:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 17:02:51 -0400 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: Message-ID: I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 16:18:10 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 21:18:10 +0000 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065BBA@EXCHANGE-05.internal.solazyme.com> Great thank you ... I will take a look at those. From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Friday, April 13, 2012 2:03 PM To: Joseph Fass; Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Functional annotation pipeline I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker > wrote: Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 16:22:37 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:22:37 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: Careful of interproscan atm.. I believe the executable is still in beta and they definitely aren't recommending it for production use yet. If you do use it be sure to check output file if using the lookup service as when i used it recently it would sometimes exit normally despite lookup failures (the lookup problems may have had something to do with running ~800 in parallel - they're looking into the issue atm.). dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Carson Holt > I would agree blast2go. > > You can also try interproscan fro the EBI > > MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help > integrate interproscan results in the GFF3 files. There are also a couple > of scripts maker_functional_gff and maker_functional_fasta that can do > putative functional annotation using uniprot/swiss-prot. > > Thanks, > Carson > > > > From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 > To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline > > Would http://blast2go.de/b2ghome be the kind of thing you're looking for? > HTH, > ~Joe > > On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > >> Hi, can you recommend any open source functional annotation pipelines - >> to assign function, GO terms, pathways, etc. to gene models? >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Joseph Fass > Lead Data Analyst > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 16:37:06 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:37:06 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: sorry, that's the new version of course. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Daniel Hughes > Careful of interproscan atm.. I believe the executable is still in beta > and they definitely aren't recommending it for production use yet. If you > do use it be sure to check output file if using the lookup service as when > i used it recently it would sometimes exit normally despite lookup failures > (the lookup problems may have had something to do with running ~800 in > parallel - they're looking into the issue atm.). > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > > 2012/4/13 Carson Holt > >> I would agree blast2go. >> >> You can also try interproscan fro the EBI >> >> MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help >> integrate interproscan results in the GFF3 files. There are also a couple >> of scripts maker_functional_gff and maker_functional_fasta that can do >> putative functional annotation using uniprot/swiss-prot. >> >> Thanks, >> Carson >> >> >> >> From: Joseph Fass >> Date: Fri, 13 Apr 2012 13:42:51 -0700 >> To: Shane Brubaker >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Functional annotation pipeline >> >> Would http://blast2go.de/b2ghome be the kind of thing you're looking for? >> HTH, >> ~Joe >> >> On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: >> >>> Hi, can you recommend any open source functional annotation pipelines - >>> to assign function, GO terms, pathways, etc. to gene models? >>> >>> Thanks, >>> Shane >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Joseph Fass >> Lead Data Analyst >> UC Davis Bioinformatics Core >> joseph.fass -at- gmail.com (professional) >> 970.227.5928 (c) || 530.752.2698 (w) >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kd7gwt at exchange.usfood.com Sat Apr 14 01:50:28 2012 From: kd7gwt at exchange.usfood.com (Liz Douglas) Date: Sat, 14 Apr 2012 14:50:28 +0800 Subject: [maker-devel] Incredible effect on your possibilities in bed Message-ID: <002801cd1a0b$727fe840$5047c36a@SAMrc6umq> http://sten-stil.dk/require.html Do you wish to satisfy your babe tonight? From ranjani at uga.edu Tue Apr 17 11:46:40 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 17 Apr 2012 16:46:40 +0000 Subject: [maker-devel] MAKER2.23 output Message-ID: Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 11:56:11 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 12:56:11 -0400 Subject: [maker-devel] MAKER2.23 output In-Reply-To: Message-ID: Segmentation fault means there was a failure with C code. It was likely in one of the modules being used. These are all potential culprits Inline::C Proc::ProcessTable DB_file forks Based on when the error occurred. I would lean more toward DB_File. Is it possible that BerkleyDB has been updated on your system, perhaps as part of another installation or a system update? That sometimes breaks this module (which is part of the perl core). You can try reinstalling that module from CPAN. Also if you run MAKER version 2.25 (latest version), you can run with -debug (i.e. 'maker -debug') to get more information just before the error occurs. You can then capture the error log send that to me. Thanks, Carson From: Sivaranjani Namasivayam Date: Tue, 17 Apr 2012 16:46:40 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER2.23 output Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 12:09:32 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 13:09:32 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } } return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 15:25:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 16:25:51 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: Sorry missed the ';' at the end of the eval block. Should be this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; --Carson From: Carson Holt Date: Tue, 17 Apr 2012 13:09:32 -0400 To: Scott Geib , Subject: Re: [maker-devel] mpi issue on computing cluster If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elzedliu at gmail.com Tue Apr 17 18:22:53 2012 From: elzedliu at gmail.com (Huanle) Date: Tue, 17 Apr 2012 16:22:53 -0700 (PDT) Subject: [maker-devel] gene predictors in MARKER Message-ID: I am using MAKER to annotate a recently assembled plant genome. Hi There, I am using MAKER to annotate a recently assembled plant genome. I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial The denovo gene predictors i included in the maker_exe.ctl file are #-----Ab-initio Gene Prediction Algorithms snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of augustus executable However, I am not sure whether they were really used. During the running, i could see repeatmasker, exonerate and wublast were called. But i did see any information popped up for those gene predictors. So i am wondering if they were actually used. Could you please let me know how to know if all or one of those gene predictors were called by marker? Kind Regards, Huanle From carsonhh at gmail.com Mon Apr 23 16:04:16 2012 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 23 Apr 2012 17:04:16 -0400 Subject: [maker-devel] gene predictors in MARKER In-Reply-To: Message-ID: The gene predictors have to be trained first, and when they are trained they produce an HMM file that can be supplied to MAKER. You can either use MAKER's protein2genome option or est2genome option to produce rough models to train with, or you can try one of the models that come prepackaged with those algorithms. SNAP models will be in --> /sw/maker/2.10/bin/../exe/snap/HMM Augustus --> run this to see species in augustus --> /sw/maker/2.10/bin/../exe/augustus/bin/augustus --species=help GeneMark is self training. Run it one directly on your genome fasta or for speed just a chromosome or two of the assembly and it will produce a file called es.mod as part of it's results. That is the file you need. If you have any questions or issues with training just let us know. Thanks, Carson On 12-04-17 7:22 PM, "Huanle" wrote: >I am using MAKER to annotate a recently assembled plant genome. >Hi There, > >I am using MAKER to annotate a recently assembled plant genome. > >I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial > >The denovo gene predictors i included in the maker_exe.ctl file are >#-----Ab-initio Gene Prediction Algorithms >snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable >gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic >genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of >augustus executable > >However, I am not sure whether they were really used. > >During the running, i could see repeatmasker, exonerate and wublast >were called. But i did see any information popped up for those gene >predictors. > >So i am wondering if they were actually used. > >Could you please let me know how to know if all or one of those gene >predictors were called by marker? > >Kind Regards, >Huanle > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From anastasia.gioti at scilifelab.se Wed Apr 25 04:09:36 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:09:36 +0200 Subject: [maker-devel] Use pass-through system to add missing genes Message-ID: Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 04:22:03 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:22:03 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated > by MAKER using EST data from a closely related species and 3 ab initio > predictors (snap iterativelly trained 3 times, genemark trained directly > on the assembly and augustus with a model from a less closely related > species), along with a set of fungal proteins. I am missing ~ 1000 proteins > when I compare to the species i used EST data from, and there is good > evidence from alignments that these genes exist. The question is how to > proceed from Blast hits to actual gene models here. The idea would be to > add these genes to the existing dataset, rather than reannotate the genome. > I believe that reannotating it without any further evidence such as RNA-seq > from the species itself would not change much,and i d rather stick with > actual predictions that i trust and have used in subsequent analyses. The > 1000 genes I can accept to annotate with a less stringent and reliable way > than MAKER, I just want to add them so that the difference in gene count > gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the alignments > between the two species in the regions where genes were missed, but as i > said, I would not like to reannotate the whole genome, and running MAKER2 > might cause slight changes that i d like to avoid. Is this possible? First, > is it possible to provide a Gff3 file of specific locations and not the > entire genome alignment? (I guess so..) Second, how can I tag the existing > annotations as 'not to be changed' or alternatively, tag the new models > only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anastasia.gioti at scilifelab.se Wed Apr 25 04:29:30 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:29:30 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi, Do you mean that I should have not include the proteins of the closely related species in my fungal protein fasta file that I used as evidence in MAKER? i do not see why... What I have been trying to do now is further 'bias' the annotations in favor of this species, so as to get the missing genes. Can you explain a bit more whta you mean? Thanks, Anastasia On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 04:39:49 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:39:49 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: sorry my bad, i missed the part about you having already included the fungal proteins as fasta ;/ - too early for me. in that case have you viewed the full gff output for specific instances of such missing proteins in something like apollo to try and work out why maker hasn't made a call at those loci (aed score...)? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > Do you mean that I should have not include the proteins of the closely > related species in my fungal protein fasta file that I used as evidence in > MAKER? i do not see why... What I have been trying to do now is further > 'bias' the annotations in favor of this species, so as to get the missing > genes. Can you explain a bit more whta you mean? > Thanks, > Anastasia > > On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > > For cross-species comparisons you might have be better off including the > actual peptide sequences of the other fungi too in the annotation run - I'd > be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated >> by MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly >> on the assembly and augustus with a model from a less closely related >> species), along with a set of fungal proteins. I am missing ~ 1000 proteins >> when I compare to the species i used EST data from, and there is good >> evidence from alignments that these genes exist. The question is how to >> proceed from Blast hits to actual gene models here. The idea would be to >> add these genes to the existing dataset, rather than reannotate the genome. >> I believe that reannotating it without any further evidence such as RNA-seq >> from the species itself would not change much,and i d rather stick with >> actual predictions that i trust and have used in subsequent analyses. The >> 1000 genes I can accept to annotate with a less stringent and reliable way >> than MAKER, I just want to add them so that the difference in gene count >> gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the >> legacy annotations scheme to do it, by providing GFF3 of the alignments >> between the two species in the regions where genes were missed, but as i >> said, I would not like to reannotate the whole genome, and running MAKER2 >> might cause slight changes that i d like to avoid. Is this possible? First, >> is it possible to provide a Gff3 file of specific locations and not the >> entire genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models >> only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 25 09:29:01 2012 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Apr 2012 10:29:01 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? If that's the case just adding the new fasta file should do the trick. Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. Thanks, Carson From: Anastasia Gioti Date: Wed, 25 Apr 2012 11:09:36 +0200 To: Subject: [maker-devel] Use pass-through system to add missing genes Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3066 bytes Desc: not available URL: From anastasia.gioti at scilifelab.se Fri Apr 27 03:43:14 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Fri, 27 Apr 2012 10:43:14 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Hi Carlson, Thanks for your help! > The way you proceed depends on why the genes are not there to begin > with. Are they not there because of a lack of evidence? It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > If that's the case just adding the new fasta file should do the trick. which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Or are they not there because an assembly error makes it impossible > to get a logical model for the region (I.e reading frame breaks). This is not the case in general. > Are there ab initio models already called in those regions that > could just be promoted to the annotation tier? You can test that > one by blasting against the nonoverlaping_abinits.fasta files. I have not done this, will do! > > For any of the cases described, you can provide the existing > annotation set as the input in GFF3 format, and previous models will > be maintained preferentially. You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > If you know which ab initio predictions you want to add (I.e. the ab > initio promoting scenario I descibed), you can provide those > predictions to the use the pred_gff option and then set keep_preds=1 > and they will be maintained even without evidence. Attached is a > script that would make selecting those easier. It take the MAKER > generated GFF3 and a list of predictions to keep (one name per > line). These might be the results of a BLAST analysis for example. > It will then return the GFF3 entries for just those models selected. The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. > > If the situation is more complex, just provide more detail, and I am > sure we can help you come up with a plan. > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? Looks feasible though. Thanks again, Anastasia > Thanks, > Carson > > From: Anastasia Gioti > Date: Wed, 25 Apr 2012 11:09:36 +0200 > To: > Subject: [maker-devel] Use pass-through system to add missing genes > > Hi, > I have a set of predicted proteins from the genome of a fungus > annotated by MAKER using EST data from a closely related species > and 3 ab initio predictors (snap iterativelly trained 3 times, > genemark trained directly on the assembly and augustus with a model > from a less closely related species), along with a set of fungal > proteins. I am missing ~ 1000 proteins when I compare to the species > i used EST data from, and there is good evidence from alignments > that these genes exist. The question is how to proceed from Blast > hits to actual gene models here. The idea would be to add these > genes to the existing dataset, rather than reannotate the genome. I > believe that reannotating it without any further evidence such as > RNA-seq from the species itself would not change much,and i d rather > stick with actual predictions that i trust and have used in > subsequent analyses. The 1000 genes I can accept to annotate with a > less stringent and reliable way than MAKER, I just want to add them > so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the > alignments between the two species in the regions where genes were > missed, but as i said, I would not like to reannotate the whole > genome, and running MAKER2 might cause slight changes that i d like > to avoid. Is this possible? First, is it possible to provide a Gff3 > file of specific locations and not the entire genome alignment? (I > guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? How should > I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Apr 27 06:57:01 2012 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 27 Apr 2012 05:57:01 -0600 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> References: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Message-ID: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. > > The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 27 08:27:24 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 09:27:24 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Message-ID: > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. The only way you should be able to get BLASTX overlap and still not get a model for the region is if 1. The protein alignment in in a different reading frame then your models for every single base pair of the alignment (in which case it's not true overlap). 2. The BLASTX HSPs are stacked on each other again and again in weird rearranged overlaps to produce a very deep alignment which would mean this is a repetitive region and is not really a significant alignment. Otherwise this should not happen unless you have the AED_threshold set to some value where MAKER will ignore genes unless they have a minimum amount of support (by default this option is always off). The other two possibilities can be tested by just looking at the alignments manually in Apollo. Also take a look at the AED and eAED values for your missing genes. Anything below 1 should always be kept by MAKER by default because it has at least some evidence supported. > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. If they are already in your current run ignore this. Barry provided detailed instructions on how to configure MAKER, for your particular case. So just follow his excellent instructions. Thanks, Carson From: Barry Moore Date: Friday, 27 April, 2012 7:57 AM To: Anastasia Gioti Cc: Carson Holt , Subject: Re: [maker-devel] Use pass-through system to add missing genes Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. >> Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a >> logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be >> promoted to the annotation tier? You can test that one by blasting against >> the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set >> as the input in GFF3 format, and previous models will be maintained >> preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not > maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio >> promoting scenario I descibed), you can provide those predictions to the use >> the pred_gff option and then set keep_preds=1 and they will be maintained >> even without evidence. Attached is a script that would make selecting those >> easier. It take the MAKER generated GFF3 and a list of predictions to keep >> (one name per line). These might be the results of a BLAST analysis for >> example. It will then return the GFF3 entries for just those models >> selected. > > The thing is, for the few cases I have looked at, I cannot really decide which > model is the best, and the 3 models from the ab initio predictors do not agree > on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we >> can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by > exonerate) to the proteins of the closely related species that i am missing, > and somehow keep the previous annotations and get the extra ones by this gff > file. But how exactly maker should be run to do this I am not sure. if I want > to keep the previous annotations I need the gff file of the last maker run as > input, but then how do I discriminate with the exonerate gff file? And which > mode of rediction should be on, and with which parameters? You mention > keep_preds=1 for the existing annotations, but how do i also promote evidence > from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by >> MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly on >> the assembly and augustus with a model from a less closely related species), >> along with a set of fungal proteins. I am missing ~ 1000 proteins when I >> compare to the species i used EST data from, and there is good evidence from >> alignments that these genes exist. The question is how to proceed from Blast >> hits to actual gene models here. The idea would be to add these genes to the >> existing dataset, rather than reannotate the genome. I believe that >> reannotating it without any further evidence such as RNA-seq from the species >> itself would not change much,and i d rather stick with actual predictions >> that i trust and have used in subsequent analyses. The 1000 genes I can >> accept to annotate with a less stringent and reliable way than MAKER, I just >> want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy >> annotations scheme to do it, by providing GFF3 of the alignments between the >> two species in the regions where genes were missed, but as i said, I would >> not like to reannotate the whole genome, and running MAKER2 might cause >> slight changes that i d like to avoid. Is this possible? First, is it >> possible to provide a Gff3 file of specific locations and not the entire >> genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models only? >> How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.collett at pnnl.gov Fri Apr 27 11:51:05 2012 From: james.collett at pnnl.gov (Collett, James R) Date: Fri, 27 Apr 2012 09:51:05 -0700 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi Carson, Could you please send me (or make available for download) the perl script that you mentioned in this previous post in this thread? >> Attached is a >> script that would make selecting those easier. It take the MAKER >> generated GFF3 and a list of predictions to keep (one name per line). >> These might be the results of a BLAST analysis for example. It will >> then return the GFF3 entries for just those models selected. Thanks, Jim __________________________________________________ James R. Collett, Ph.D. Senior Scientist Chemical and Biological Process Development Group Energy and Environment Directorate Pacific Northwest National Laboratory > -----Original Message----- > From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- > bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- > lab.org > Sent: Friday, April 27, 2012 6:48 AM > To: maker-devel at yandell-lab.org > Subject: maker-devel Digest, Vol 47, Issue 14 > > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Use pass-through system to add missing genes (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 27 Apr 2012 09:27:24 -0400 > From: Carson Holt > To: Barry Moore , Anastasia Gioti > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > The only way you should be able to get BLASTX overlap and still not get > a model for the region is if 1. The protein alignment in in a > different reading frame then your models for every single base pair of > the alignment (in which case it's not true overlap). 2. The BLASTX > HSPs are stacked on each other again and again in weird rearranged > overlaps to produce a very deep alignment which would mean this is a > repetitive region and is not really a significant alignment. Otherwise > this should not happen unless you have the AED_threshold set to some > value where MAKER will ignore genes unless they have a minimum amount > of support (by default this option is always off). The other two > possibilities can be tested by just looking at the alignments manually > in Apollo. Also take a look at the AED and eAED values for your > missing genes. Anything below 1 should always be kept by MAKER by > default because it has at least some evidence supported. > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > If they are already in your current run ignore this. > > Barry provided detailed instructions on how to configure MAKER, for > your particular case. So just follow his excellent instructions. > > Thanks, > Carson > > > > From: Barry Moore > Date: Friday, 27 April, 2012 7:57 AM > To: Anastasia Gioti > Cc: Carson Holt , > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > > Hi Anastasia, > > On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > > > Hi Carlson, > > Thanks for your help! > > > >> The way you proceed depends on why the genes are not there to begin > with. > >> Are they not there because of a lack of evidence? > > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > > > This doesn't sound right. If there are predicted models and blastx > protein evidence overlapping them you should get a model retained. I > know for the EST evidence that it has to support a splice site before > it will be promoted and I can't remember if protein evidence is the > same but certainly if you pass back those protein2genome predictions > and the original proteins as evidence then they will be retained as > models. > > >> If that's the case just adding the new fasta file should do the > trick. > > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > > > Yes using the protein fasta from the closely related species as > evidence. I think you said you've already done that right? > > > >> Or are they not there because an assembly error makes it impossible > >> to get a logical model for the region (I.e reading frame breaks). > > > > This is not the case in general. > > > >> Are there ab initio models already called in those regions that > could > >> just be promoted to the annotation tier? You can test that one by > >> blasting against the nonoverlaping_abinits.fasta files. > > > > I have not done this, will do! > > > >> > >> For any of the cases described, you can provide the existing > >> annotation set as the input in GFF3 format, and previous models will > >> be maintained preferentially. > > > > You mean in a new maker run? is this possible with the old maker as > > well, not maker2, right? > > > > Yes, the original MAKER will do this. > > > >> If you know which ab initio predictions you want to add (I.e. the ab > >> initio promoting scenario I descibed), you can provide those > >> predictions to the use the pred_gff option and then set keep_preds=1 > >> and they will be maintained even without evidence. Attached is a > >> script that would make selecting those easier. It take the MAKER > >> generated GFF3 and a list of predictions to keep (one name per > line). > >> These might be the results of a BLAST analysis for example. It will > >> then return the GFF3 entries for just those models selected. > > > > The thing is, for the few cases I have looked at, I cannot really > > decide which model is the best, and the 3 models from the ab initio > > predictors do not agree on the exact intron-exon junctions or the > start and stop codons. > >> > >> If the situation is more complex, just provide more detail, and I am > >> sure we can help you come up with a plan. > >> > > What i was thinking to do was to provide a gff file of alignments (eg > > by > > exonerate) to the proteins of the closely related species that i am > > missing, and somehow keep the previous annotations and get the extra > > ones by this gff file. But how exactly maker should be run to do this > > I am not sure. if I want to keep the previous annotations I need the > > gff file of the last maker run as input, but then how do I > > discriminate with the exonerate gff file? And which mode of rediction > > should be on, and with which parameters? You mention > > keep_preds=1 for the existing annotations, but how do i also promote > > evidence from alignments on the same way in the same run? > > Looks feasible though. Thanks again, > > Anastasia > > > > Let me just restate what you've said so that I can be sure that I am > correct about what you've already done. You have run Maker with SNAP, > Genemark and Augustus using EST from a closely related species (passed > to altest) and protein evidence from other fungi. You are missing > about 1,000 genes compared to the species that provided the EST > alignments. You say their is good evidence that these genes exist from > the alignments and I assume by this that you mean the EST/protein > alignments that Maker produced. > > 1) Is the closely related fungus annotated and if so have you included > it's proteins in the evidence set that you provided to Maker. If you > haven't provided these proteins as evidence to maker then you should do > this. You can re-run maker passing your original models back through > like this: > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=1 > other_pass=1 > > #-----Protein Homology Evidence (for best results provide a file for at > least one) protein=proteins_from_closely_related.fasta > ## OR it sounds like you've already aligned these with exonerate? > protein_gff=proteins_from_closely_related_already_aligned.gff > > 2) If you've already included those closely related species proteins > but still didn't get the 1,000 genes, then take your > nonoverlaping_abinits.fasta and blast them directly against your > closely related proteins. Presumably they don't hit too well because > if they did they should have been promoted to predictions by Maker the > first time, but here you can decide yourself what thresholds to allow > to keep the abinit predictions that hit the closely related species > proteins. If you filter you blast hits the way you want and keep the > names of the abinit predictions that pass your filter, then use the > script Carson attached it it will generate a abinit precidtion GFF file > with only the predictions you selected. You can then pass those > predictions back to Maker and force it to keep them and Maker will turn > them from predictions > (match/match_part) into gene models. > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=0 > other_pass=1 > > #-----Gene Prediction > snaphmm= > gmhmm= > augustus_species= > fgenesh_par_file= > pred_gff=ab_init_predictions_rescued_by_blast.gff > > keep_preds=1 > > Barry > > >> Thanks, > >> Carson > >> > >> From: Anastasia Gioti > >> Date: Wed, 25 Apr 2012 11:09:36 +0200 > >> To: > >> Subject: [maker-devel] Use pass-through system to add missing genes > >> > >> Hi, > >> I have a set of predicted proteins from the genome of a fungus > >> annotated by MAKER using EST data from a closely related species > and > >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark > >> trained directly on the assembly and augustus with a model from a > >> less closely related species), along with a set of fungal proteins. > I > >> am missing ~ 1000 proteins when I compare to the species i used EST > >> data from, and there is good evidence from alignments that these > >> genes exist. The question is how to proceed from Blast hits to > actual > >> gene models here. The idea would be to add these genes to the > >> existing dataset, rather than reannotate the genome. I believe that > >> reannotating it without any further evidence such as RNA-seq from > the > >> species itself would not change much,and i d rather stick with > actual > >> predictions that i trust and have used in subsequent analyses. The > >> 1000 genes I can accept to annotate with a less stringent and > reliable way than MAKER, I just want to add them so that the difference > in gene count gets corrected. > >> I was reading the MAKER 2 paper and i was wondering if I can use the > >> legacy annotations scheme to do it, by providing GFF3 of the > >> alignments between the two species in the regions where genes were > >> missed, but as i said, I would not like to reannotate the whole > >> genome, and running MAKER2 might cause slight changes that i d like > >> to avoid. Is this possible? First, is it possible to provide a Gff3 > >> file of specific locations and not the entire genome alignment? (I > >> guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? > >> How should I run maker2, with which predictors on and which off? > >> Thanks, > >> Anastasia > >> > >> Anastasia Gioti > >> Post-doctoral Researcher > >> > >> anastasia.gioti at scilifelab.se > >> anastasia.gioti at ebc.uu.se > >> > >> > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia > >> / > >> > >> > >> > >> _______________________________________________ maker-devel mailing > >> list > >> maker- > devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis > >> tinfo/ma > >> ker-devel_yandell-lab.org > >> > > > > Anastasia Gioti > > Post-doctoral Researcher > > > > anastasia.gioti at scilifelab.se > > anastasia.gioti at ebc.uu.se > > > > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.or > > g > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: lab.org/attachments/20120427/72b70d49/attachment.html> > > ------------------------------ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > End of maker-devel Digest, Vol 47, Issue 14 > ******************************************* From carsonhh at gmail.com Fri Apr 27 12:18:23 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 13:18:23 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: Here you go. This will also be part of the next MAKER release in some form. Thanks, Carson On 12-04-27 12:51 PM, "Collett, James R" wrote: >Hi Carson, > >Could you please send me (or make available for download) the perl script >that you mentioned in this previous post in this thread? > >>> Attached is a >>> script that would make selecting those easier. It take the MAKER >>> generated GFF3 and a list of predictions to keep (one name per line). >>> These might be the results of a BLAST analysis for example. It will >>> then return the GFF3 entries for just those models selected. > >Thanks, > >Jim >__________________________________________________ >James R. Collett, Ph.D. >Senior Scientist >Chemical and Biological Process Development Group >Energy and Environment Directorate >Pacific Northwest National Laboratory > >> -----Original Message----- >> From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- >> bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- >> lab.org >> Sent: Friday, April 27, 2012 6:48 AM >> To: maker-devel at yandell-lab.org >> Subject: maker-devel Digest, Vol 47, Issue 14 >> >> Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.org >> >> or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >> You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of maker-devel digest..." >> >> >> Today's Topics: >> >> 1. Re: Use pass-through system to add missing genes (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 27 Apr 2012 09:27:24 -0400 >> From: Carson Holt >> To: Barry Moore , Anastasia Gioti >> >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> Message-ID: >> Content-Type: text/plain; charset="us-ascii" >> >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> >> The only way you should be able to get BLASTX overlap and still not get >> a model for the region is if 1. The protein alignment in in a >> different reading frame then your models for every single base pair of >> the alignment (in which case it's not true overlap). 2. The BLASTX >> HSPs are stacked on each other again and again in weird rearranged >> overlaps to produce a very deep alignment which would mean this is a >> repetitive region and is not really a significant alignment. Otherwise >> this should not happen unless you have the AED_threshold set to some >> value where MAKER will ignore genes unless they have a minimum amount >> of support (by default this option is always off). The other two >> possibilities can be tested by just looking at the alignments manually >> in Apollo. Also take a look at the AED and eAED values for your >> missing genes. Anything below 1 should always be kept by MAKER by >> default because it has at least some evidence supported. >> >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> >> If they are already in your current run ignore this. >> >> Barry provided detailed instructions on how to configure MAKER, for >> your particular case. So just follow his excellent instructions. >> >> Thanks, >> Carson >> >> >> >> From: Barry Moore >> Date: Friday, 27 April, 2012 7:57 AM >> To: Anastasia Gioti >> Cc: Carson Holt , >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> >> Hi Anastasia, >> >> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: >> >> > Hi Carlson, >> > Thanks for your help! >> > >> >> The way you proceed depends on why the genes are not there to begin >> with. >> >> Are they not there because of a lack of evidence? >> > >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> > >> >> This doesn't sound right. If there are predicted models and blastx >> protein evidence overlapping them you should get a model retained. I >> know for the EST evidence that it has to support a splice site before >> it will be promoted and I can't remember if protein evidence is the >> same but certainly if you pass back those protein2genome predictions >> and the original proteins as evidence then they will be retained as >> models. >> >> >> If that's the case just adding the new fasta file should do the >> trick. >> > >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> > >> >> Yes using the protein fasta from the closely related species as >> evidence. I think you said you've already done that right? >> >> >> >> Or are they not there because an assembly error makes it impossible >> >> to get a logical model for the region (I.e reading frame breaks). >> > >> > This is not the case in general. >> > >> >> Are there ab initio models already called in those regions that >> could >> >> just be promoted to the annotation tier? You can test that one by >> >> blasting against the nonoverlaping_abinits.fasta files. >> > >> > I have not done this, will do! >> > >> >> >> >> For any of the cases described, you can provide the existing >> >> annotation set as the input in GFF3 format, and previous models will >> >> be maintained preferentially. >> > >> > You mean in a new maker run? is this possible with the old maker as >> > well, not maker2, right? >> > >> >> Yes, the original MAKER will do this. >> >> >> >> If you know which ab initio predictions you want to add (I.e. the ab >> >> initio promoting scenario I descibed), you can provide those >> >> predictions to the use the pred_gff option and then set keep_preds=1 >> >> and they will be maintained even without evidence. Attached is a >> >> script that would make selecting those easier. It take the MAKER >> >> generated GFF3 and a list of predictions to keep (one name per >> line). >> >> These might be the results of a BLAST analysis for example. It will >> >> then return the GFF3 entries for just those models selected. >> > >> > The thing is, for the few cases I have looked at, I cannot really >> > decide which model is the best, and the 3 models from the ab initio >> > predictors do not agree on the exact intron-exon junctions or the >> start and stop codons. >> >> >> >> If the situation is more complex, just provide more detail, and I am >> >> sure we can help you come up with a plan. >> >> >> > What i was thinking to do was to provide a gff file of alignments (eg >> > by >> > exonerate) to the proteins of the closely related species that i am >> > missing, and somehow keep the previous annotations and get the extra >> > ones by this gff file. But how exactly maker should be run to do this >> > I am not sure. if I want to keep the previous annotations I need the >> > gff file of the last maker run as input, but then how do I >> > discriminate with the exonerate gff file? And which mode of rediction >> > should be on, and with which parameters? You mention >> > keep_preds=1 for the existing annotations, but how do i also promote >> > evidence from alignments on the same way in the same run? >> > Looks feasible though. Thanks again, >> > Anastasia >> > >> >> Let me just restate what you've said so that I can be sure that I am >> correct about what you've already done. You have run Maker with SNAP, >> Genemark and Augustus using EST from a closely related species (passed >> to altest) and protein evidence from other fungi. You are missing >> about 1,000 genes compared to the species that provided the EST >> alignments. You say their is good evidence that these genes exist from >> the alignments and I assume by this that you mean the EST/protein >> alignments that Maker produced. >> >> 1) Is the closely related fungus annotated and if so have you included >> it's proteins in the evidence set that you provided to Maker. If you >> haven't provided these proteins as evidence to maker then you should do >> this. You can re-run maker passing your original models back through >> like this: >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=1 >> other_pass=1 >> >> #-----Protein Homology Evidence (for best results provide a file for at >> least one) protein=proteins_from_closely_related.fasta >> ## OR it sounds like you've already aligned these with exonerate? >> protein_gff=proteins_from_closely_related_already_aligned.gff >> >> 2) If you've already included those closely related species proteins >> but still didn't get the 1,000 genes, then take your >> nonoverlaping_abinits.fasta and blast them directly against your >> closely related proteins. Presumably they don't hit too well because >> if they did they should have been promoted to predictions by Maker the >> first time, but here you can decide yourself what thresholds to allow >> to keep the abinit predictions that hit the closely related species >> proteins. If you filter you blast hits the way you want and keep the >> names of the abinit predictions that pass your filter, then use the >> script Carson attached it it will generate a abinit precidtion GFF file >> with only the predictions you selected. You can then pass those >> predictions back to Maker and force it to keep them and Maker will turn >> them from predictions >> (match/match_part) into gene models. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=0 >> other_pass=1 >> >> #-----Gene Prediction >> snaphmm= >> gmhmm= >> augustus_species= >> fgenesh_par_file= >> pred_gff=ab_init_predictions_rescued_by_blast.gff >> >> keep_preds=1 >> >> Barry >> >> >> Thanks, >> >> Carson >> >> >> >> From: Anastasia Gioti >> >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> >> To: >> >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> >> >> Hi, >> >> I have a set of predicted proteins from the genome of a fungus >> >> annotated by MAKER using EST data from a closely related species >> and >> >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark >> >> trained directly on the assembly and augustus with a model from a >> >> less closely related species), along with a set of fungal proteins. >> I >> >> am missing ~ 1000 proteins when I compare to the species i used EST >> >> data from, and there is good evidence from alignments that these >> >> genes exist. The question is how to proceed from Blast hits to >> actual >> >> gene models here. The idea would be to add these genes to the >> >> existing dataset, rather than reannotate the genome. I believe that >> >> reannotating it without any further evidence such as RNA-seq from >> the >> >> species itself would not change much,and i d rather stick with >> actual >> >> predictions that i trust and have used in subsequent analyses. The >> >> 1000 genes I can accept to annotate with a less stringent and >> reliable way than MAKER, I just want to add them so that the difference >> in gene count gets corrected. >> >> I was reading the MAKER 2 paper and i was wondering if I can use the >> >> legacy annotations scheme to do it, by providing GFF3 of the >> >> alignments between the two species in the regions where genes were >> >> missed, but as i said, I would not like to reannotate the whole >> >> genome, and running MAKER2 might cause slight changes that i d like >> >> to avoid. Is this possible? First, is it possible to provide a Gff3 >> >> file of specific locations and not the entire genome alignment? (I >> >> guess so..) Second, how can I tag the existing annotations as 'not >> to be changed' or alternatively, tag the new models only? >> >> How should I run maker2, with which predictors on and which off? >> >> Thanks, >> >> Anastasia >> >> >> >> Anastasia Gioti >> >> Post-doctoral Researcher >> >> >> >> anastasia.gioti at scilifelab.se >> >> anastasia.gioti at ebc.uu.se >> >> >> >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia >> >> / >> >> >> >> >> >> >> >> _______________________________________________ maker-devel mailing >> >> list >> >> maker- >> devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis >> >> tinfo/ma >> >> ker-devel_yandell-lab.org >> >> >> > >> > Anastasia Gioti >> > Post-doctoral Researcher >> > >> > anastasia.gioti at scilifelab.se >> > anastasia.gioti at ebc.uu.se >> > >> > >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> > >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.or >> > g >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > lab.org/attachments/20120427/72b70d49/attachment.html> >> >> ------------------------------ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> End of maker-devel Digest, Vol 47, Issue 14 >> ******************************************* > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3066 bytes Desc: not available URL: From weckalba at asu.edu Tue Apr 3 17:28:28 2012 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 3 Apr 2012 16:28:28 -0700 Subject: [maker-devel] gff3_preds2models usage question Message-ID: Hello maker developers and users, I am attempting to use the gff3_preds2models scripts, but running into a few issues. Initially, I hit errors that seemed to be fixed by installing CGI and its dependancies. However, that during that installation a few tests did fail. I can provide error logs if that would be helpful, however, I went on to install and attempt gff3_preds2models anyway. What I am currently doing is running gff3_merge first, to gather the maker outputs. I am doing so with both the -n option on and off. When providing the gff3 file with the sequence I get the following error from gff3_preds2models: Undefined subroutine &maker::auto_annotator::annotate called at /Users/Walter/Bioinformatics/Tools/maker/bin/gff3_preds2models line 97, line 992291. This seemed to be the same error as that of what someone else saw on these boards, but I did not see a later email resolving the issue. I also tried giving it just the gff3 without the sequences at the bottom of the file and then I get this error: ERROR: There was a problem in the writing the fasta entry Either no sequence was given, or there was an error in writing This leads me to believe I should be using the one with the sequence, but I am not certain of that. I see it might be possible to go from maker outputs to chado database then to gene->mRNA->exon gff3s, but I have not set up my machine for XML or chado yet, and it does not appear trivial. Thanks for the help, Walter -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Apr 3 20:24:49 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Wed, 4 Apr 2012 02:24:49 +0000 Subject: [maker-devel] mRNA-seq data Message-ID: Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 3 20:39:02 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 03 Apr 2012 22:39:02 -0400 Subject: [maker-devel] mRNA-seq data In-Reply-To: Message-ID: Yes. If you have them in fasta format, just provide them to the est= option and let MAEKR align them with exonerate. If you used something like cufflinks or trinity, to process them you can provide them to the est_gff option (MAKER comes with a cufflinks2gff3 converter to make that easy). Thanks, Carson From: Sivaranjani Namasivayam Date: Wed, 4 Apr 2012 02:24:49 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] mRNA-seq data Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pingouinandsheep at gmail.com Thu Apr 5 08:14:39 2012 From: pingouinandsheep at gmail.com (pingouinandsheep at gmail.com) Date: Thu, 5 Apr 2012 07:14:39 -0700 (PDT) Subject: [maker-devel] Huge memory usage Message-ID: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Hello, When I try to run the test provided with maker2, maker start to use a huge amount of memory. I stoped it after it reach ~100go of memory used. I believe the test should not use that amount of memory. In an other message someone suggest that the bioperl version installed could be the cause of the problem, but the bioperl installed on my cluster is already at version 1.6. perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.006901 Unfortunately I don't have an error message to provide, that could clarify my problem. But maybe it is a recurrent problem and you know a few things I should check. Thanks, Ismael From carsonhh at gmail.com Thu Apr 5 08:26:17 2012 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Apr 2012 10:26:17 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Message-ID: The test should not use up more then a few megabytes of RAM. Even on very large datasets you should never really use more that 1 or 2 gig of RAM perl MAKER instance It's possible that their may be other perl modules that are broken need to be reinstalled on your system. This can happen when perl gets updated, but you are pointing to modules built for a different perl version with the PERL5LIB environmental variable. Make sure you you have the latest version of MAKER and run with --debug set. Collect that output and send it to me (the --debug option does some dependancy checking). I know there is an issue on Macs with updating perl's DB_File module that causes it to gobble up big sections of the hard drive (it will eventually fill the drive if you let it). It's not a memory issue but just one example of how broken modules can cause weird behavior. Thanks, Carson On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" wrote: >Hello, > >When I try to run the test provided with maker2, maker start to use a >huge amount of memory. I stoped it after it reach ~100go of memory >used. I believe the test should not use that amount of memory. > >In an other message someone suggest that the bioperl version installed >could be the cause of the problem, but the bioperl installed on my >cluster is already at version 1.6. > >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >1.006901 > >Unfortunately I don't have an error message to provide, that could >clarify my problem. > >But maybe it is a recurrent problem and you know a few things I should >check. > >Thanks, > >Ismael > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eernst at cshl.edu Sun Apr 8 16:09:22 2012 From: eernst at cshl.edu (Evan Ernst) Date: Sun, 8 Apr 2012 18:09:22 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Message-ID: Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 08:26:40 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2012 10:26:40 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI In-Reply-To: Message-ID: Depending on if your using NFS and other architecture design you can get race conditions with the datastore log file. This primarily happens when you have multiple instances of MAKER running at the same time or thousands of short contigs running in parallel so many finish at the same time. In a future release, I plan on having the last MAKER job to exit just rebuild the log at the end of a run to ensure it is complete. For now though, just run 'maker -dsindex' at the end of a run when it happens. It will rebbuild the log and only takes a few seconds. Thanks, Carson From: Evan Ernst Date: Sun, 8 Apr 2012 18:09:22 -0400 To: Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From smg283 at gmail.com Fri Apr 13 13:00:29 2012 From: smg283 at gmail.com (Scott Geib) Date: Fri, 13 Apr 2012 09:00:29 -1000 Subject: [maker-devel] mpi issue on computing cluster Message-ID: Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 14:12:24 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 20:12:24 +0000 Subject: [maker-devel] Functional annotation pipeline Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane From joseph.fass at gmail.com Fri Apr 13 14:42:51 2012 From: joseph.fass at gmail.com (Joseph Fass) Date: Fri, 13 Apr 2012 13:42:51 -0700 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Message-ID: Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 13:51:38 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 15:51:38 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: Message-ID: You can pre-mask the genome, convert the RepaetMasker results to GFF3 and pass them in, or just run the ./configure script in the RepeatMasker directory to configure wublast to be the default. You can also let MAKER install it's own separate installation of RepeatMasker using rmblast. Just go to the maker/src/ directory and run this command --> ./Build repeatmasker MAKER will use that installation preferentially if you let it install that. Thanks, Carson From: padioleau isma?l Date: Fri, 13 Apr 2012 17:42:20 +0200 To: Carson Holt Subject: Re: [maker-devel] Huge memory usage Dear Carson, I have a problem with RepeatMasker on my cluster. It work with wublast but not with Crossmatch. As maker try to run RepeatMasker with default I can not successfully run maker. I wanted to know if I can provide to maker the genome already masked (if I run with wublast externally), I though it was possible but I can't found in the configuration files where I should provide it i.e : In maker_opts.ctl, should I provide the result from RepeatMasker to 'genome_gff:' and set 'rm_pass' to 1, or set rm_gff in the 'Repeat Masking' part of the file? Or maybe I should provide directly the masked fasta as genome reference. An other solution could be to ask maker to run RepeatMasker with the option '-e wublast'. Is it possible to use one of these solutions? Thanks, Ismael 2012/4/5 padioleau isma?l > Dear Carson, > > Thank you for your very quick answering. > > I realised that I missed some error messages and the problem seems to be > linked to the DB_file package as you suggested. The person in charge of > installation told me that he will recover the configuration. > > I will test it after the Easter weekend and come back to you if we have other > issues. > > Have a nice Easter weekend, > > Ismael > > Here Is the error message: > Use of uninitialized value $DB_File::db_version in numeric ge (>=) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 276. > Use of uninitialized value $DB_File::db_version in numeric gt (>) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 280. > Deep recursion on subroutine "DB_File::AUTOLOAD" at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 235. > > > > 2012/4/5 Carson Holt >> The test should not use up more then a few megabytes of RAM. Even on very >> large datasets you should never really use more that 1 or 2 gig of RAM >> perl MAKER instance >> >> It's possible that their may be other perl modules that are broken need to >> be reinstalled on your system. This can happen when perl gets updated, >> but you are pointing to modules built for a different perl version with >> the PERL5LIB environmental variable. Make sure you you have the latest >> version of MAKER and run with --debug set. Collect that output and send >> it to me (the --debug option does some dependancy checking). >> >> I know there is an issue on Macs with updating perl's DB_File module that >> causes it to gobble up big sections of the hard drive (it will eventually >> fill the drive if you let it). It's not a memory issue but just one >> example of how broken modules can cause weird behavior. >> >> Thanks, >> Carson >> >> >> >> >> On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" >> wrote: >> >>> >Hello, >>> > >>> >When I try to run the test provided with maker2, maker start to use a >>> >huge amount of memory. I stoped it after it reach ~100go of memory >>> >used. I believe the test should not use that amount of memory. >>> > >>> >In an other message someone suggest that the bioperl version installed >>> >could be the cause of the problem, but the bioperl installed on my >>> >cluster is already at version 1.6. >>> > >>> >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >>> >1.006901 >>> > >>> >Unfortunately I don't have an error message to provide, that could >>> >clarify my problem. >>> > >>> >But maybe it is a recurrent problem and you know a few things I should >>> >check. >>> > >>> >Thanks, >>> > >>> >Ismael >>> > >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > > -- > Isma?l Padioleau > Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) > Emmanouil Dermitzakis Group > Dpt de M?decine G?n?tique et D?veloppement > Universit? de Gen?ve - Facult? de M?decine > CMU - Rue Michel-Servet 1 > CH 1211 Gen?ve 4 > Tel: 0041 22 379 59 74 > ismael.padioleau at unige.ch > > -- > Tel. 0041 78 77 69 561 > ismpadioleau at gmail.com -- Isma?l Padioleau Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) Emmanouil Dermitzakis Group Dpt de M?decine G?n?tique et D?veloppement Universit? de Gen?ve - Facult? de M?decine CMU - Rue Michel-Servet 1 CH 1211 Gen?ve 4 Tel: 0041 22 379 59 74 ismael.padioleau at unige.ch -- Tel. 0041 78 77 69 561 ismpadioleau at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 15:02:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 17:02:51 -0400 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: Message-ID: I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 15:18:10 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 21:18:10 +0000 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065BBA@EXCHANGE-05.internal.solazyme.com> Great thank you ... I will take a look at those. From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Friday, April 13, 2012 2:03 PM To: Joseph Fass; Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Functional annotation pipeline I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker > wrote: Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 15:22:37 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:22:37 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: Careful of interproscan atm.. I believe the executable is still in beta and they definitely aren't recommending it for production use yet. If you do use it be sure to check output file if using the lookup service as when i used it recently it would sometimes exit normally despite lookup failures (the lookup problems may have had something to do with running ~800 in parallel - they're looking into the issue atm.). dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Carson Holt > I would agree blast2go. > > You can also try interproscan fro the EBI > > MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help > integrate interproscan results in the GFF3 files. There are also a couple > of scripts maker_functional_gff and maker_functional_fasta that can do > putative functional annotation using uniprot/swiss-prot. > > Thanks, > Carson > > > > From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 > To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline > > Would http://blast2go.de/b2ghome be the kind of thing you're looking for? > HTH, > ~Joe > > On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > >> Hi, can you recommend any open source functional annotation pipelines - >> to assign function, GO terms, pathways, etc. to gene models? >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Joseph Fass > Lead Data Analyst > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 15:37:06 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:37:06 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: sorry, that's the new version of course. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Daniel Hughes > Careful of interproscan atm.. I believe the executable is still in beta > and they definitely aren't recommending it for production use yet. If you > do use it be sure to check output file if using the lookup service as when > i used it recently it would sometimes exit normally despite lookup failures > (the lookup problems may have had something to do with running ~800 in > parallel - they're looking into the issue atm.). > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > > 2012/4/13 Carson Holt > >> I would agree blast2go. >> >> You can also try interproscan fro the EBI >> >> MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help >> integrate interproscan results in the GFF3 files. There are also a couple >> of scripts maker_functional_gff and maker_functional_fasta that can do >> putative functional annotation using uniprot/swiss-prot. >> >> Thanks, >> Carson >> >> >> >> From: Joseph Fass >> Date: Fri, 13 Apr 2012 13:42:51 -0700 >> To: Shane Brubaker >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Functional annotation pipeline >> >> Would http://blast2go.de/b2ghome be the kind of thing you're looking for? >> HTH, >> ~Joe >> >> On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: >> >>> Hi, can you recommend any open source functional annotation pipelines - >>> to assign function, GO terms, pathways, etc. to gene models? >>> >>> Thanks, >>> Shane >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Joseph Fass >> Lead Data Analyst >> UC Davis Bioinformatics Core >> joseph.fass -at- gmail.com (professional) >> 970.227.5928 (c) || 530.752.2698 (w) >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kd7gwt at exchange.usfood.com Sat Apr 14 00:50:28 2012 From: kd7gwt at exchange.usfood.com (Liz Douglas) Date: Sat, 14 Apr 2012 14:50:28 +0800 Subject: [maker-devel] Incredible effect on your possibilities in bed Message-ID: <002801cd1a0b$727fe840$5047c36a@SAMrc6umq> http://sten-stil.dk/require.html Do you wish to satisfy your babe tonight? From ranjani at uga.edu Tue Apr 17 10:46:40 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 17 Apr 2012 16:46:40 +0000 Subject: [maker-devel] MAKER2.23 output Message-ID: Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 10:56:11 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 12:56:11 -0400 Subject: [maker-devel] MAKER2.23 output In-Reply-To: Message-ID: Segmentation fault means there was a failure with C code. It was likely in one of the modules being used. These are all potential culprits Inline::C Proc::ProcessTable DB_file forks Based on when the error occurred. I would lean more toward DB_File. Is it possible that BerkleyDB has been updated on your system, perhaps as part of another installation or a system update? That sometimes breaks this module (which is part of the perl core). You can try reinstalling that module from CPAN. Also if you run MAKER version 2.25 (latest version), you can run with -debug (i.e. 'maker -debug') to get more information just before the error occurs. You can then capture the error log send that to me. Thanks, Carson From: Sivaranjani Namasivayam Date: Tue, 17 Apr 2012 16:46:40 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER2.23 output Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 11:09:32 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 13:09:32 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } } return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 14:25:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 16:25:51 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: Sorry missed the ';' at the end of the eval block. Should be this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; --Carson From: Carson Holt Date: Tue, 17 Apr 2012 13:09:32 -0400 To: Scott Geib , Subject: Re: [maker-devel] mpi issue on computing cluster If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elzedliu at gmail.com Tue Apr 17 17:22:53 2012 From: elzedliu at gmail.com (Huanle) Date: Tue, 17 Apr 2012 16:22:53 -0700 (PDT) Subject: [maker-devel] gene predictors in MARKER Message-ID: I am using MAKER to annotate a recently assembled plant genome. Hi There, I am using MAKER to annotate a recently assembled plant genome. I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial The denovo gene predictors i included in the maker_exe.ctl file are #-----Ab-initio Gene Prediction Algorithms snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of augustus executable However, I am not sure whether they were really used. During the running, i could see repeatmasker, exonerate and wublast were called. But i did see any information popped up for those gene predictors. So i am wondering if they were actually used. Could you please let me know how to know if all or one of those gene predictors were called by marker? Kind Regards, Huanle From carsonhh at gmail.com Mon Apr 23 15:04:16 2012 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 23 Apr 2012 17:04:16 -0400 Subject: [maker-devel] gene predictors in MARKER In-Reply-To: Message-ID: The gene predictors have to be trained first, and when they are trained they produce an HMM file that can be supplied to MAKER. You can either use MAKER's protein2genome option or est2genome option to produce rough models to train with, or you can try one of the models that come prepackaged with those algorithms. SNAP models will be in --> /sw/maker/2.10/bin/../exe/snap/HMM Augustus --> run this to see species in augustus --> /sw/maker/2.10/bin/../exe/augustus/bin/augustus --species=help GeneMark is self training. Run it one directly on your genome fasta or for speed just a chromosome or two of the assembly and it will produce a file called es.mod as part of it's results. That is the file you need. If you have any questions or issues with training just let us know. Thanks, Carson On 12-04-17 7:22 PM, "Huanle" wrote: >I am using MAKER to annotate a recently assembled plant genome. >Hi There, > >I am using MAKER to annotate a recently assembled plant genome. > >I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial > >The denovo gene predictors i included in the maker_exe.ctl file are >#-----Ab-initio Gene Prediction Algorithms >snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable >gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic >genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of >augustus executable > >However, I am not sure whether they were really used. > >During the running, i could see repeatmasker, exonerate and wublast >were called. But i did see any information popped up for those gene >predictors. > >So i am wondering if they were actually used. > >Could you please let me know how to know if all or one of those gene >predictors were called by marker? > >Kind Regards, >Huanle > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From anastasia.gioti at scilifelab.se Wed Apr 25 03:09:36 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:09:36 +0200 Subject: [maker-devel] Use pass-through system to add missing genes Message-ID: Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 03:22:03 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:22:03 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated > by MAKER using EST data from a closely related species and 3 ab initio > predictors (snap iterativelly trained 3 times, genemark trained directly > on the assembly and augustus with a model from a less closely related > species), along with a set of fungal proteins. I am missing ~ 1000 proteins > when I compare to the species i used EST data from, and there is good > evidence from alignments that these genes exist. The question is how to > proceed from Blast hits to actual gene models here. The idea would be to > add these genes to the existing dataset, rather than reannotate the genome. > I believe that reannotating it without any further evidence such as RNA-seq > from the species itself would not change much,and i d rather stick with > actual predictions that i trust and have used in subsequent analyses. The > 1000 genes I can accept to annotate with a less stringent and reliable way > than MAKER, I just want to add them so that the difference in gene count > gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the alignments > between the two species in the regions where genes were missed, but as i > said, I would not like to reannotate the whole genome, and running MAKER2 > might cause slight changes that i d like to avoid. Is this possible? First, > is it possible to provide a Gff3 file of specific locations and not the > entire genome alignment? (I guess so..) Second, how can I tag the existing > annotations as 'not to be changed' or alternatively, tag the new models > only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anastasia.gioti at scilifelab.se Wed Apr 25 03:29:30 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:29:30 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi, Do you mean that I should have not include the proteins of the closely related species in my fungal protein fasta file that I used as evidence in MAKER? i do not see why... What I have been trying to do now is further 'bias' the annotations in favor of this species, so as to get the missing genes. Can you explain a bit more whta you mean? Thanks, Anastasia On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 03:39:49 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:39:49 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: sorry my bad, i missed the part about you having already included the fungal proteins as fasta ;/ - too early for me. in that case have you viewed the full gff output for specific instances of such missing proteins in something like apollo to try and work out why maker hasn't made a call at those loci (aed score...)? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > Do you mean that I should have not include the proteins of the closely > related species in my fungal protein fasta file that I used as evidence in > MAKER? i do not see why... What I have been trying to do now is further > 'bias' the annotations in favor of this species, so as to get the missing > genes. Can you explain a bit more whta you mean? > Thanks, > Anastasia > > On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > > For cross-species comparisons you might have be better off including the > actual peptide sequences of the other fungi too in the annotation run - I'd > be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated >> by MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly >> on the assembly and augustus with a model from a less closely related >> species), along with a set of fungal proteins. I am missing ~ 1000 proteins >> when I compare to the species i used EST data from, and there is good >> evidence from alignments that these genes exist. The question is how to >> proceed from Blast hits to actual gene models here. The idea would be to >> add these genes to the existing dataset, rather than reannotate the genome. >> I believe that reannotating it without any further evidence such as RNA-seq >> from the species itself would not change much,and i d rather stick with >> actual predictions that i trust and have used in subsequent analyses. The >> 1000 genes I can accept to annotate with a less stringent and reliable way >> than MAKER, I just want to add them so that the difference in gene count >> gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the >> legacy annotations scheme to do it, by providing GFF3 of the alignments >> between the two species in the regions where genes were missed, but as i >> said, I would not like to reannotate the whole genome, and running MAKER2 >> might cause slight changes that i d like to avoid. Is this possible? First, >> is it possible to provide a Gff3 file of specific locations and not the >> entire genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models >> only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 25 08:29:01 2012 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Apr 2012 10:29:01 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? If that's the case just adding the new fasta file should do the trick. Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. Thanks, Carson From: Anastasia Gioti Date: Wed, 25 Apr 2012 11:09:36 +0200 To: Subject: [maker-devel] Use pass-through system to add missing genes Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3066 bytes Desc: not available URL: From anastasia.gioti at scilifelab.se Fri Apr 27 02:43:14 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Fri, 27 Apr 2012 10:43:14 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Hi Carlson, Thanks for your help! > The way you proceed depends on why the genes are not there to begin > with. Are they not there because of a lack of evidence? It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > If that's the case just adding the new fasta file should do the trick. which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Or are they not there because an assembly error makes it impossible > to get a logical model for the region (I.e reading frame breaks). This is not the case in general. > Are there ab initio models already called in those regions that > could just be promoted to the annotation tier? You can test that > one by blasting against the nonoverlaping_abinits.fasta files. I have not done this, will do! > > For any of the cases described, you can provide the existing > annotation set as the input in GFF3 format, and previous models will > be maintained preferentially. You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > If you know which ab initio predictions you want to add (I.e. the ab > initio promoting scenario I descibed), you can provide those > predictions to the use the pred_gff option and then set keep_preds=1 > and they will be maintained even without evidence. Attached is a > script that would make selecting those easier. It take the MAKER > generated GFF3 and a list of predictions to keep (one name per > line). These might be the results of a BLAST analysis for example. > It will then return the GFF3 entries for just those models selected. The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. > > If the situation is more complex, just provide more detail, and I am > sure we can help you come up with a plan. > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? Looks feasible though. Thanks again, Anastasia > Thanks, > Carson > > From: Anastasia Gioti > Date: Wed, 25 Apr 2012 11:09:36 +0200 > To: > Subject: [maker-devel] Use pass-through system to add missing genes > > Hi, > I have a set of predicted proteins from the genome of a fungus > annotated by MAKER using EST data from a closely related species > and 3 ab initio predictors (snap iterativelly trained 3 times, > genemark trained directly on the assembly and augustus with a model > from a less closely related species), along with a set of fungal > proteins. I am missing ~ 1000 proteins when I compare to the species > i used EST data from, and there is good evidence from alignments > that these genes exist. The question is how to proceed from Blast > hits to actual gene models here. The idea would be to add these > genes to the existing dataset, rather than reannotate the genome. I > believe that reannotating it without any further evidence such as > RNA-seq from the species itself would not change much,and i d rather > stick with actual predictions that i trust and have used in > subsequent analyses. The 1000 genes I can accept to annotate with a > less stringent and reliable way than MAKER, I just want to add them > so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the > alignments between the two species in the regions where genes were > missed, but as i said, I would not like to reannotate the whole > genome, and running MAKER2 might cause slight changes that i d like > to avoid. Is this possible? First, is it possible to provide a Gff3 > file of specific locations and not the entire genome alignment? (I > guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? How should > I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Apr 27 05:57:01 2012 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 27 Apr 2012 05:57:01 -0600 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> References: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Message-ID: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. > > The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 27 07:27:24 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 09:27:24 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Message-ID: > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. The only way you should be able to get BLASTX overlap and still not get a model for the region is if 1. The protein alignment in in a different reading frame then your models for every single base pair of the alignment (in which case it's not true overlap). 2. The BLASTX HSPs are stacked on each other again and again in weird rearranged overlaps to produce a very deep alignment which would mean this is a repetitive region and is not really a significant alignment. Otherwise this should not happen unless you have the AED_threshold set to some value where MAKER will ignore genes unless they have a minimum amount of support (by default this option is always off). The other two possibilities can be tested by just looking at the alignments manually in Apollo. Also take a look at the AED and eAED values for your missing genes. Anything below 1 should always be kept by MAKER by default because it has at least some evidence supported. > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. If they are already in your current run ignore this. Barry provided detailed instructions on how to configure MAKER, for your particular case. So just follow his excellent instructions. Thanks, Carson From: Barry Moore Date: Friday, 27 April, 2012 7:57 AM To: Anastasia Gioti Cc: Carson Holt , Subject: Re: [maker-devel] Use pass-through system to add missing genes Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. >> Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a >> logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be >> promoted to the annotation tier? You can test that one by blasting against >> the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set >> as the input in GFF3 format, and previous models will be maintained >> preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not > maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio >> promoting scenario I descibed), you can provide those predictions to the use >> the pred_gff option and then set keep_preds=1 and they will be maintained >> even without evidence. Attached is a script that would make selecting those >> easier. It take the MAKER generated GFF3 and a list of predictions to keep >> (one name per line). These might be the results of a BLAST analysis for >> example. It will then return the GFF3 entries for just those models >> selected. > > The thing is, for the few cases I have looked at, I cannot really decide which > model is the best, and the 3 models from the ab initio predictors do not agree > on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we >> can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by > exonerate) to the proteins of the closely related species that i am missing, > and somehow keep the previous annotations and get the extra ones by this gff > file. But how exactly maker should be run to do this I am not sure. if I want > to keep the previous annotations I need the gff file of the last maker run as > input, but then how do I discriminate with the exonerate gff file? And which > mode of rediction should be on, and with which parameters? You mention > keep_preds=1 for the existing annotations, but how do i also promote evidence > from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by >> MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly on >> the assembly and augustus with a model from a less closely related species), >> along with a set of fungal proteins. I am missing ~ 1000 proteins when I >> compare to the species i used EST data from, and there is good evidence from >> alignments that these genes exist. The question is how to proceed from Blast >> hits to actual gene models here. The idea would be to add these genes to the >> existing dataset, rather than reannotate the genome. I believe that >> reannotating it without any further evidence such as RNA-seq from the species >> itself would not change much,and i d rather stick with actual predictions >> that i trust and have used in subsequent analyses. The 1000 genes I can >> accept to annotate with a less stringent and reliable way than MAKER, I just >> want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy >> annotations scheme to do it, by providing GFF3 of the alignments between the >> two species in the regions where genes were missed, but as i said, I would >> not like to reannotate the whole genome, and running MAKER2 might cause >> slight changes that i d like to avoid. Is this possible? First, is it >> possible to provide a Gff3 file of specific locations and not the entire >> genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models only? >> How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.collett at pnnl.gov Fri Apr 27 10:51:05 2012 From: james.collett at pnnl.gov (Collett, James R) Date: Fri, 27 Apr 2012 09:51:05 -0700 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi Carson, Could you please send me (or make available for download) the perl script that you mentioned in this previous post in this thread? >> Attached is a >> script that would make selecting those easier. It take the MAKER >> generated GFF3 and a list of predictions to keep (one name per line). >> These might be the results of a BLAST analysis for example. It will >> then return the GFF3 entries for just those models selected. Thanks, Jim __________________________________________________ James R. Collett, Ph.D. Senior Scientist Chemical and Biological Process Development Group Energy and Environment Directorate Pacific Northwest National Laboratory > -----Original Message----- > From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- > bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- > lab.org > Sent: Friday, April 27, 2012 6:48 AM > To: maker-devel at yandell-lab.org > Subject: maker-devel Digest, Vol 47, Issue 14 > > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Use pass-through system to add missing genes (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 27 Apr 2012 09:27:24 -0400 > From: Carson Holt > To: Barry Moore , Anastasia Gioti > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > The only way you should be able to get BLASTX overlap and still not get > a model for the region is if 1. The protein alignment in in a > different reading frame then your models for every single base pair of > the alignment (in which case it's not true overlap). 2. The BLASTX > HSPs are stacked on each other again and again in weird rearranged > overlaps to produce a very deep alignment which would mean this is a > repetitive region and is not really a significant alignment. Otherwise > this should not happen unless you have the AED_threshold set to some > value where MAKER will ignore genes unless they have a minimum amount > of support (by default this option is always off). The other two > possibilities can be tested by just looking at the alignments manually > in Apollo. Also take a look at the AED and eAED values for your > missing genes. Anything below 1 should always be kept by MAKER by > default because it has at least some evidence supported. > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > If they are already in your current run ignore this. > > Barry provided detailed instructions on how to configure MAKER, for > your particular case. So just follow his excellent instructions. > > Thanks, > Carson > > > > From: Barry Moore > Date: Friday, 27 April, 2012 7:57 AM > To: Anastasia Gioti > Cc: Carson Holt , > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > > Hi Anastasia, > > On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > > > Hi Carlson, > > Thanks for your help! > > > >> The way you proceed depends on why the genes are not there to begin > with. > >> Are they not there because of a lack of evidence? > > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > > > This doesn't sound right. If there are predicted models and blastx > protein evidence overlapping them you should get a model retained. I > know for the EST evidence that it has to support a splice site before > it will be promoted and I can't remember if protein evidence is the > same but certainly if you pass back those protein2genome predictions > and the original proteins as evidence then they will be retained as > models. > > >> If that's the case just adding the new fasta file should do the > trick. > > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > > > Yes using the protein fasta from the closely related species as > evidence. I think you said you've already done that right? > > > >> Or are they not there because an assembly error makes it impossible > >> to get a logical model for the region (I.e reading frame breaks). > > > > This is not the case in general. > > > >> Are there ab initio models already called in those regions that > could > >> just be promoted to the annotation tier? You can test that one by > >> blasting against the nonoverlaping_abinits.fasta files. > > > > I have not done this, will do! > > > >> > >> For any of the cases described, you can provide the existing > >> annotation set as the input in GFF3 format, and previous models will > >> be maintained preferentially. > > > > You mean in a new maker run? is this possible with the old maker as > > well, not maker2, right? > > > > Yes, the original MAKER will do this. > > > >> If you know which ab initio predictions you want to add (I.e. the ab > >> initio promoting scenario I descibed), you can provide those > >> predictions to the use the pred_gff option and then set keep_preds=1 > >> and they will be maintained even without evidence. Attached is a > >> script that would make selecting those easier. It take the MAKER > >> generated GFF3 and a list of predictions to keep (one name per > line). > >> These might be the results of a BLAST analysis for example. It will > >> then return the GFF3 entries for just those models selected. > > > > The thing is, for the few cases I have looked at, I cannot really > > decide which model is the best, and the 3 models from the ab initio > > predictors do not agree on the exact intron-exon junctions or the > start and stop codons. > >> > >> If the situation is more complex, just provide more detail, and I am > >> sure we can help you come up with a plan. > >> > > What i was thinking to do was to provide a gff file of alignments (eg > > by > > exonerate) to the proteins of the closely related species that i am > > missing, and somehow keep the previous annotations and get the extra > > ones by this gff file. But how exactly maker should be run to do this > > I am not sure. if I want to keep the previous annotations I need the > > gff file of the last maker run as input, but then how do I > > discriminate with the exonerate gff file? And which mode of rediction > > should be on, and with which parameters? You mention > > keep_preds=1 for the existing annotations, but how do i also promote > > evidence from alignments on the same way in the same run? > > Looks feasible though. Thanks again, > > Anastasia > > > > Let me just restate what you've said so that I can be sure that I am > correct about what you've already done. You have run Maker with SNAP, > Genemark and Augustus using EST from a closely related species (passed > to altest) and protein evidence from other fungi. You are missing > about 1,000 genes compared to the species that provided the EST > alignments. You say their is good evidence that these genes exist from > the alignments and I assume by this that you mean the EST/protein > alignments that Maker produced. > > 1) Is the closely related fungus annotated and if so have you included > it's proteins in the evidence set that you provided to Maker. If you > haven't provided these proteins as evidence to maker then you should do > this. You can re-run maker passing your original models back through > like this: > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=1 > other_pass=1 > > #-----Protein Homology Evidence (for best results provide a file for at > least one) protein=proteins_from_closely_related.fasta > ## OR it sounds like you've already aligned these with exonerate? > protein_gff=proteins_from_closely_related_already_aligned.gff > > 2) If you've already included those closely related species proteins > but still didn't get the 1,000 genes, then take your > nonoverlaping_abinits.fasta and blast them directly against your > closely related proteins. Presumably they don't hit too well because > if they did they should have been promoted to predictions by Maker the > first time, but here you can decide yourself what thresholds to allow > to keep the abinit predictions that hit the closely related species > proteins. If you filter you blast hits the way you want and keep the > names of the abinit predictions that pass your filter, then use the > script Carson attached it it will generate a abinit precidtion GFF file > with only the predictions you selected. You can then pass those > predictions back to Maker and force it to keep them and Maker will turn > them from predictions > (match/match_part) into gene models. > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=0 > other_pass=1 > > #-----Gene Prediction > snaphmm= > gmhmm= > augustus_species= > fgenesh_par_file= > pred_gff=ab_init_predictions_rescued_by_blast.gff > > keep_preds=1 > > Barry > > >> Thanks, > >> Carson > >> > >> From: Anastasia Gioti > >> Date: Wed, 25 Apr 2012 11:09:36 +0200 > >> To: > >> Subject: [maker-devel] Use pass-through system to add missing genes > >> > >> Hi, > >> I have a set of predicted proteins from the genome of a fungus > >> annotated by MAKER using EST data from a closely related species > and > >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark > >> trained directly on the assembly and augustus with a model from a > >> less closely related species), along with a set of fungal proteins. > I > >> am missing ~ 1000 proteins when I compare to the species i used EST > >> data from, and there is good evidence from alignments that these > >> genes exist. The question is how to proceed from Blast hits to > actual > >> gene models here. The idea would be to add these genes to the > >> existing dataset, rather than reannotate the genome. I believe that > >> reannotating it without any further evidence such as RNA-seq from > the > >> species itself would not change much,and i d rather stick with > actual > >> predictions that i trust and have used in subsequent analyses. The > >> 1000 genes I can accept to annotate with a less stringent and > reliable way than MAKER, I just want to add them so that the difference > in gene count gets corrected. > >> I was reading the MAKER 2 paper and i was wondering if I can use the > >> legacy annotations scheme to do it, by providing GFF3 of the > >> alignments between the two species in the regions where genes were > >> missed, but as i said, I would not like to reannotate the whole > >> genome, and running MAKER2 might cause slight changes that i d like > >> to avoid. Is this possible? First, is it possible to provide a Gff3 > >> file of specific locations and not the entire genome alignment? (I > >> guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? > >> How should I run maker2, with which predictors on and which off? > >> Thanks, > >> Anastasia > >> > >> Anastasia Gioti > >> Post-doctoral Researcher > >> > >> anastasia.gioti at scilifelab.se > >> anastasia.gioti at ebc.uu.se > >> > >> > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia > >> / > >> > >> > >> > >> _______________________________________________ maker-devel mailing > >> list > >> maker- > devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis > >> tinfo/ma > >> ker-devel_yandell-lab.org > >> > > > > Anastasia Gioti > > Post-doctoral Researcher > > > > anastasia.gioti at scilifelab.se > > anastasia.gioti at ebc.uu.se > > > > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.or > > g > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: lab.org/attachments/20120427/72b70d49/attachment.html> > > ------------------------------ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > End of maker-devel Digest, Vol 47, Issue 14 > ******************************************* From carsonhh at gmail.com Fri Apr 27 11:18:23 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 13:18:23 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: Here you go. This will also be part of the next MAKER release in some form. Thanks, Carson On 12-04-27 12:51 PM, "Collett, James R" wrote: >Hi Carson, > >Could you please send me (or make available for download) the perl script >that you mentioned in this previous post in this thread? > >>> Attached is a >>> script that would make selecting those easier. It take the MAKER >>> generated GFF3 and a list of predictions to keep (one name per line). >>> These might be the results of a BLAST analysis for example. It will >>> then return the GFF3 entries for just those models selected. > >Thanks, > >Jim >__________________________________________________ >James R. Collett, Ph.D. >Senior Scientist >Chemical and Biological Process Development Group >Energy and Environment Directorate >Pacific Northwest National Laboratory > >> -----Original Message----- >> From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- >> bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- >> lab.org >> Sent: Friday, April 27, 2012 6:48 AM >> To: maker-devel at yandell-lab.org >> Subject: maker-devel Digest, Vol 47, Issue 14 >> >> Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.org >> >> or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >> You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of maker-devel digest..." >> >> >> Today's Topics: >> >> 1. Re: Use pass-through system to add missing genes (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 27 Apr 2012 09:27:24 -0400 >> From: Carson Holt >> To: Barry Moore , Anastasia Gioti >> >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> Message-ID: >> Content-Type: text/plain; charset="us-ascii" >> >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> >> The only way you should be able to get BLASTX overlap and still not get >> a model for the region is if 1. The protein alignment in in a >> different reading frame then your models for every single base pair of >> the alignment (in which case it's not true overlap). 2. The BLASTX >> HSPs are stacked on each other again and again in weird rearranged >> overlaps to produce a very deep alignment which would mean this is a >> repetitive region and is not really a significant alignment. Otherwise >> this should not happen unless you have the AED_threshold set to some >> value where MAKER will ignore genes unless they have a minimum amount >> of support (by default this option is always off). The other two >> possibilities can be tested by just looking at the alignments manually >> in Apollo. Also take a look at the AED and eAED values for your >> missing genes. Anything below 1 should always be kept by MAKER by >> default because it has at least some evidence supported. >> >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> >> If they are already in your current run ignore this. >> >> Barry provided detailed instructions on how to configure MAKER, for >> your particular case. So just follow his excellent instructions. >> >> Thanks, >> Carson >> >> >> >> From: Barry Moore >> Date: Friday, 27 April, 2012 7:57 AM >> To: Anastasia Gioti >> Cc: Carson Holt , >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> >> Hi Anastasia, >> >> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: >> >> > Hi Carlson, >> > Thanks for your help! >> > >> >> The way you proceed depends on why the genes are not there to begin >> with. >> >> Are they not there because of a lack of evidence? >> > >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> > >> >> This doesn't sound right. If there are predicted models and blastx >> protein evidence overlapping them you should get a model retained. I >> know for the EST evidence that it has to support a splice site before >> it will be promoted and I can't remember if protein evidence is the >> same but certainly if you pass back those protein2genome predictions >> and the original proteins as evidence then they will be retained as >> models. >> >> >> If that's the case just adding the new fasta file should do the >> trick. >> > >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> > >> >> Yes using the protein fasta from the closely related species as >> evidence. I think you said you've already done that right? >> >> >> >> Or are they not there because an assembly error makes it impossible >> >> to get a logical model for the region (I.e reading frame breaks). >> > >> > This is not the case in general. >> > >> >> Are there ab initio models already called in those regions that >> could >> >> just be promoted to the annotation tier? You can test that one by >> >> blasting against the nonoverlaping_abinits.fasta files. >> > >> > I have not done this, will do! >> > >> >> >> >> For any of the cases described, you can provide the existing >> >> annotation set as the input in GFF3 format, and previous models will >> >> be maintained preferentially. >> > >> > You mean in a new maker run? is this possible with the old maker as >> > well, not maker2, right? >> > >> >> Yes, the original MAKER will do this. >> >> >> >> If you know which ab initio predictions you want to add (I.e. the ab >> >> initio promoting scenario I descibed), you can provide those >> >> predictions to the use the pred_gff option and then set keep_preds=1 >> >> and they will be maintained even without evidence. Attached is a >> >> script that would make selecting those easier. It take the MAKER >> >> generated GFF3 and a list of predictions to keep (one name per >> line). >> >> These might be the results of a BLAST analysis for example. It will >> >> then return the GFF3 entries for just those models selected. >> > >> > The thing is, for the few cases I have looked at, I cannot really >> > decide which model is the best, and the 3 models from the ab initio >> > predictors do not agree on the exact intron-exon junctions or the >> start and stop codons. >> >> >> >> If the situation is more complex, just provide more detail, and I am >> >> sure we can help you come up with a plan. >> >> >> > What i was thinking to do was to provide a gff file of alignments (eg >> > by >> > exonerate) to the proteins of the closely related species that i am >> > missing, and somehow keep the previous annotations and get the extra >> > ones by this gff file. But how exactly maker should be run to do this >> > I am not sure. if I want to keep the previous annotations I need the >> > gff file of the last maker run as input, but then how do I >> > discriminate with the exonerate gff file? And which mode of rediction >> > should be on, and with which parameters? You mention >> > keep_preds=1 for the existing annotations, but how do i also promote >> > evidence from alignments on the same way in the same run? >> > Looks feasible though. Thanks again, >> > Anastasia >> > >> >> Let me just restate what you've said so that I can be sure that I am >> correct about what you've already done. You have run Maker with SNAP, >> Genemark and Augustus using EST from a closely related species (passed >> to altest) and protein evidence from other fungi. You are missing >> about 1,000 genes compared to the species that provided the EST >> alignments. You say their is good evidence that these genes exist from >> the alignments and I assume by this that you mean the EST/protein >> alignments that Maker produced. >> >> 1) Is the closely related fungus annotated and if so have you included >> it's proteins in the evidence set that you provided to Maker. If you >> haven't provided these proteins as evidence to maker then you should do >> this. You can re-run maker passing your original models back through >> like this: >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=1 >> other_pass=1 >> >> #-----Protein Homology Evidence (for best results provide a file for at >> least one) protein=proteins_from_closely_related.fasta >> ## OR it sounds like you've already aligned these with exonerate? >> protein_gff=proteins_from_closely_related_already_aligned.gff >> >> 2) If you've already included those closely related species proteins >> but still didn't get the 1,000 genes, then take your >> nonoverlaping_abinits.fasta and blast them directly against your >> closely related proteins. Presumably they don't hit too well because >> if they did they should have been promoted to predictions by Maker the >> first time, but here you can decide yourself what thresholds to allow >> to keep the abinit predictions that hit the closely related species >> proteins. If you filter you blast hits the way you want and keep the >> names of the abinit predictions that pass your filter, then use the >> script Carson attached it it will generate a abinit precidtion GFF file >> with only the predictions you selected. You can then pass those >> predictions back to Maker and force it to keep them and Maker will turn >> them from predictions >> (match/match_part) into gene models. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=0 >> other_pass=1 >> >> #-----Gene Prediction >> snaphmm= >> gmhmm= >> augustus_species= >> fgenesh_par_file= >> pred_gff=ab_init_predictions_rescued_by_blast.gff >> >> keep_preds=1 >> >> Barry >> >> >> Thanks, >> >> Carson >> >> >> >> From: Anastasia Gioti >> >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> >> To: >> >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> >> >> Hi, >> >> I have a set of predicted proteins from the genome of a fungus >> >> annotated by MAKER using EST data from a closely related species >> and >> >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark >> >> trained directly on the assembly and augustus with a model from a >> >> less closely related species), along with a set of fungal proteins. >> I >> >> am missing ~ 1000 proteins when I compare to the species i used EST >> >> data from, and there is good evidence from alignments that these >> >> genes exist. The question is how to proceed from Blast hits to >> actual >> >> gene models here. The idea would be to add these genes to the >> >> existing dataset, rather than reannotate the genome. I believe that >> >> reannotating it without any further evidence such as RNA-seq from >> the >> >> species itself would not change much,and i d rather stick with >> actual >> >> predictions that i trust and have used in subsequent analyses. The >> >> 1000 genes I can accept to annotate with a less stringent and >> reliable way than MAKER, I just want to add them so that the difference >> in gene count gets corrected. >> >> I was reading the MAKER 2 paper and i was wondering if I can use the >> >> legacy annotations scheme to do it, by providing GFF3 of the >> >> alignments between the two species in the regions where genes were >> >> missed, but as i said, I would not like to reannotate the whole >> >> genome, and running MAKER2 might cause slight changes that i d like >> >> to avoid. Is this possible? First, is it possible to provide a Gff3 >> >> file of specific locations and not the entire genome alignment? (I >> >> guess so..) Second, how can I tag the existing annotations as 'not >> to be changed' or alternatively, tag the new models only? >> >> How should I run maker2, with which predictors on and which off? >> >> Thanks, >> >> Anastasia >> >> >> >> Anastasia Gioti >> >> Post-doctoral Researcher >> >> >> >> anastasia.gioti at scilifelab.se >> >> anastasia.gioti at ebc.uu.se >> >> >> >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia >> >> / >> >> >> >> >> >> >> >> _______________________________________________ maker-devel mailing >> >> list >> >> maker- >> devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis >> >> tinfo/ma >> >> ker-devel_yandell-lab.org >> >> >> > >> > Anastasia Gioti >> > Post-doctoral Researcher >> > >> > anastasia.gioti at scilifelab.se >> > anastasia.gioti at ebc.uu.se >> > >> > >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> > >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.or >> > g >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > lab.org/attachments/20120427/72b70d49/attachment.html> >> >> ------------------------------ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> End of maker-devel Digest, Vol 47, Issue 14 >> ******************************************* > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3066 bytes Desc: not available URL: From weckalba at asu.edu Tue Apr 3 17:28:28 2012 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 3 Apr 2012 16:28:28 -0700 Subject: [maker-devel] gff3_preds2models usage question Message-ID: Hello maker developers and users, I am attempting to use the gff3_preds2models scripts, but running into a few issues. Initially, I hit errors that seemed to be fixed by installing CGI and its dependancies. However, that during that installation a few tests did fail. I can provide error logs if that would be helpful, however, I went on to install and attempt gff3_preds2models anyway. What I am currently doing is running gff3_merge first, to gather the maker outputs. I am doing so with both the -n option on and off. When providing the gff3 file with the sequence I get the following error from gff3_preds2models: Undefined subroutine &maker::auto_annotator::annotate called at /Users/Walter/Bioinformatics/Tools/maker/bin/gff3_preds2models line 97, line 992291. This seemed to be the same error as that of what someone else saw on these boards, but I did not see a later email resolving the issue. I also tried giving it just the gff3 without the sequences at the bottom of the file and then I get this error: ERROR: There was a problem in the writing the fasta entry Either no sequence was given, or there was an error in writing This leads me to believe I should be using the one with the sequence, but I am not certain of that. I see it might be possible to go from maker outputs to chado database then to gene->mRNA->exon gff3s, but I have not set up my machine for XML or chado yet, and it does not appear trivial. Thanks for the help, Walter -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Apr 3 20:24:49 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Wed, 4 Apr 2012 02:24:49 +0000 Subject: [maker-devel] mRNA-seq data Message-ID: Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 3 20:39:02 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 03 Apr 2012 22:39:02 -0400 Subject: [maker-devel] mRNA-seq data In-Reply-To: Message-ID: Yes. If you have them in fasta format, just provide them to the est= option and let MAEKR align them with exonerate. If you used something like cufflinks or trinity, to process them you can provide them to the est_gff option (MAKER comes with a cufflinks2gff3 converter to make that easy). Thanks, Carson From: Sivaranjani Namasivayam Date: Wed, 4 Apr 2012 02:24:49 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] mRNA-seq data Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pingouinandsheep at gmail.com Thu Apr 5 08:14:39 2012 From: pingouinandsheep at gmail.com (pingouinandsheep at gmail.com) Date: Thu, 5 Apr 2012 07:14:39 -0700 (PDT) Subject: [maker-devel] Huge memory usage Message-ID: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Hello, When I try to run the test provided with maker2, maker start to use a huge amount of memory. I stoped it after it reach ~100go of memory used. I believe the test should not use that amount of memory. In an other message someone suggest that the bioperl version installed could be the cause of the problem, but the bioperl installed on my cluster is already at version 1.6. perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.006901 Unfortunately I don't have an error message to provide, that could clarify my problem. But maybe it is a recurrent problem and you know a few things I should check. Thanks, Ismael From carsonhh at gmail.com Thu Apr 5 08:26:17 2012 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Apr 2012 10:26:17 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Message-ID: The test should not use up more then a few megabytes of RAM. Even on very large datasets you should never really use more that 1 or 2 gig of RAM perl MAKER instance It's possible that their may be other perl modules that are broken need to be reinstalled on your system. This can happen when perl gets updated, but you are pointing to modules built for a different perl version with the PERL5LIB environmental variable. Make sure you you have the latest version of MAKER and run with --debug set. Collect that output and send it to me (the --debug option does some dependancy checking). I know there is an issue on Macs with updating perl's DB_File module that causes it to gobble up big sections of the hard drive (it will eventually fill the drive if you let it). It's not a memory issue but just one example of how broken modules can cause weird behavior. Thanks, Carson On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" wrote: >Hello, > >When I try to run the test provided with maker2, maker start to use a >huge amount of memory. I stoped it after it reach ~100go of memory >used. I believe the test should not use that amount of memory. > >In an other message someone suggest that the bioperl version installed >could be the cause of the problem, but the bioperl installed on my >cluster is already at version 1.6. > >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >1.006901 > >Unfortunately I don't have an error message to provide, that could >clarify my problem. > >But maybe it is a recurrent problem and you know a few things I should >check. > >Thanks, > >Ismael > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eernst at cshl.edu Sun Apr 8 16:09:22 2012 From: eernst at cshl.edu (Evan Ernst) Date: Sun, 8 Apr 2012 18:09:22 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Message-ID: Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 08:26:40 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2012 10:26:40 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI In-Reply-To: Message-ID: Depending on if your using NFS and other architecture design you can get race conditions with the datastore log file. This primarily happens when you have multiple instances of MAKER running at the same time or thousands of short contigs running in parallel so many finish at the same time. In a future release, I plan on having the last MAKER job to exit just rebuild the log at the end of a run to ensure it is complete. For now though, just run 'maker -dsindex' at the end of a run when it happens. It will rebbuild the log and only takes a few seconds. Thanks, Carson From: Evan Ernst Date: Sun, 8 Apr 2012 18:09:22 -0400 To: Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From smg283 at gmail.com Fri Apr 13 13:00:29 2012 From: smg283 at gmail.com (Scott Geib) Date: Fri, 13 Apr 2012 09:00:29 -1000 Subject: [maker-devel] mpi issue on computing cluster Message-ID: Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 14:12:24 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 20:12:24 +0000 Subject: [maker-devel] Functional annotation pipeline Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane From joseph.fass at gmail.com Fri Apr 13 14:42:51 2012 From: joseph.fass at gmail.com (Joseph Fass) Date: Fri, 13 Apr 2012 13:42:51 -0700 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Message-ID: Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 13:51:38 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 15:51:38 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: Message-ID: You can pre-mask the genome, convert the RepaetMasker results to GFF3 and pass them in, or just run the ./configure script in the RepeatMasker directory to configure wublast to be the default. You can also let MAKER install it's own separate installation of RepeatMasker using rmblast. Just go to the maker/src/ directory and run this command --> ./Build repeatmasker MAKER will use that installation preferentially if you let it install that. Thanks, Carson From: padioleau isma?l Date: Fri, 13 Apr 2012 17:42:20 +0200 To: Carson Holt Subject: Re: [maker-devel] Huge memory usage Dear Carson, I have a problem with RepeatMasker on my cluster. It work with wublast but not with Crossmatch. As maker try to run RepeatMasker with default I can not successfully run maker. I wanted to know if I can provide to maker the genome already masked (if I run with wublast externally), I though it was possible but I can't found in the configuration files where I should provide it i.e : In maker_opts.ctl, should I provide the result from RepeatMasker to 'genome_gff:' and set 'rm_pass' to 1, or set rm_gff in the 'Repeat Masking' part of the file? Or maybe I should provide directly the masked fasta as genome reference. An other solution could be to ask maker to run RepeatMasker with the option '-e wublast'. Is it possible to use one of these solutions? Thanks, Ismael 2012/4/5 padioleau isma?l > Dear Carson, > > Thank you for your very quick answering. > > I realised that I missed some error messages and the problem seems to be > linked to the DB_file package as you suggested. The person in charge of > installation told me that he will recover the configuration. > > I will test it after the Easter weekend and come back to you if we have other > issues. > > Have a nice Easter weekend, > > Ismael > > Here Is the error message: > Use of uninitialized value $DB_File::db_version in numeric ge (>=) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 276. > Use of uninitialized value $DB_File::db_version in numeric gt (>) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 280. > Deep recursion on subroutine "DB_File::AUTOLOAD" at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 235. > > > > 2012/4/5 Carson Holt >> The test should not use up more then a few megabytes of RAM. Even on very >> large datasets you should never really use more that 1 or 2 gig of RAM >> perl MAKER instance >> >> It's possible that their may be other perl modules that are broken need to >> be reinstalled on your system. This can happen when perl gets updated, >> but you are pointing to modules built for a different perl version with >> the PERL5LIB environmental variable. Make sure you you have the latest >> version of MAKER and run with --debug set. Collect that output and send >> it to me (the --debug option does some dependancy checking). >> >> I know there is an issue on Macs with updating perl's DB_File module that >> causes it to gobble up big sections of the hard drive (it will eventually >> fill the drive if you let it). It's not a memory issue but just one >> example of how broken modules can cause weird behavior. >> >> Thanks, >> Carson >> >> >> >> >> On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" >> wrote: >> >>> >Hello, >>> > >>> >When I try to run the test provided with maker2, maker start to use a >>> >huge amount of memory. I stoped it after it reach ~100go of memory >>> >used. I believe the test should not use that amount of memory. >>> > >>> >In an other message someone suggest that the bioperl version installed >>> >could be the cause of the problem, but the bioperl installed on my >>> >cluster is already at version 1.6. >>> > >>> >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >>> >1.006901 >>> > >>> >Unfortunately I don't have an error message to provide, that could >>> >clarify my problem. >>> > >>> >But maybe it is a recurrent problem and you know a few things I should >>> >check. >>> > >>> >Thanks, >>> > >>> >Ismael >>> > >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > > -- > Isma?l Padioleau > Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) > Emmanouil Dermitzakis Group > Dpt de M?decine G?n?tique et D?veloppement > Universit? de Gen?ve - Facult? de M?decine > CMU - Rue Michel-Servet 1 > CH 1211 Gen?ve 4 > Tel: 0041 22 379 59 74 > ismael.padioleau at unige.ch > > -- > Tel. 0041 78 77 69 561 > ismpadioleau at gmail.com -- Isma?l Padioleau Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) Emmanouil Dermitzakis Group Dpt de M?decine G?n?tique et D?veloppement Universit? de Gen?ve - Facult? de M?decine CMU - Rue Michel-Servet 1 CH 1211 Gen?ve 4 Tel: 0041 22 379 59 74 ismael.padioleau at unige.ch -- Tel. 0041 78 77 69 561 ismpadioleau at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 15:02:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 17:02:51 -0400 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: Message-ID: I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 15:18:10 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 21:18:10 +0000 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065BBA@EXCHANGE-05.internal.solazyme.com> Great thank you ... I will take a look at those. From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Friday, April 13, 2012 2:03 PM To: Joseph Fass; Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Functional annotation pipeline I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker > wrote: Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 15:22:37 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:22:37 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: Careful of interproscan atm.. I believe the executable is still in beta and they definitely aren't recommending it for production use yet. If you do use it be sure to check output file if using the lookup service as when i used it recently it would sometimes exit normally despite lookup failures (the lookup problems may have had something to do with running ~800 in parallel - they're looking into the issue atm.). dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Carson Holt > I would agree blast2go. > > You can also try interproscan fro the EBI > > MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help > integrate interproscan results in the GFF3 files. There are also a couple > of scripts maker_functional_gff and maker_functional_fasta that can do > putative functional annotation using uniprot/swiss-prot. > > Thanks, > Carson > > > > From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 > To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline > > Would http://blast2go.de/b2ghome be the kind of thing you're looking for? > HTH, > ~Joe > > On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > >> Hi, can you recommend any open source functional annotation pipelines - >> to assign function, GO terms, pathways, etc. to gene models? >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Joseph Fass > Lead Data Analyst > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 15:37:06 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:37:06 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: sorry, that's the new version of course. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Daniel Hughes > Careful of interproscan atm.. I believe the executable is still in beta > and they definitely aren't recommending it for production use yet. If you > do use it be sure to check output file if using the lookup service as when > i used it recently it would sometimes exit normally despite lookup failures > (the lookup problems may have had something to do with running ~800 in > parallel - they're looking into the issue atm.). > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > > 2012/4/13 Carson Holt > >> I would agree blast2go. >> >> You can also try interproscan fro the EBI >> >> MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help >> integrate interproscan results in the GFF3 files. There are also a couple >> of scripts maker_functional_gff and maker_functional_fasta that can do >> putative functional annotation using uniprot/swiss-prot. >> >> Thanks, >> Carson >> >> >> >> From: Joseph Fass >> Date: Fri, 13 Apr 2012 13:42:51 -0700 >> To: Shane Brubaker >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Functional annotation pipeline >> >> Would http://blast2go.de/b2ghome be the kind of thing you're looking for? >> HTH, >> ~Joe >> >> On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: >> >>> Hi, can you recommend any open source functional annotation pipelines - >>> to assign function, GO terms, pathways, etc. to gene models? >>> >>> Thanks, >>> Shane >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Joseph Fass >> Lead Data Analyst >> UC Davis Bioinformatics Core >> joseph.fass -at- gmail.com (professional) >> 970.227.5928 (c) || 530.752.2698 (w) >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kd7gwt at exchange.usfood.com Sat Apr 14 00:50:28 2012 From: kd7gwt at exchange.usfood.com (Liz Douglas) Date: Sat, 14 Apr 2012 14:50:28 +0800 Subject: [maker-devel] Incredible effect on your possibilities in bed Message-ID: <002801cd1a0b$727fe840$5047c36a@SAMrc6umq> http://sten-stil.dk/require.html Do you wish to satisfy your babe tonight? From ranjani at uga.edu Tue Apr 17 10:46:40 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 17 Apr 2012 16:46:40 +0000 Subject: [maker-devel] MAKER2.23 output Message-ID: Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 10:56:11 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 12:56:11 -0400 Subject: [maker-devel] MAKER2.23 output In-Reply-To: Message-ID: Segmentation fault means there was a failure with C code. It was likely in one of the modules being used. These are all potential culprits Inline::C Proc::ProcessTable DB_file forks Based on when the error occurred. I would lean more toward DB_File. Is it possible that BerkleyDB has been updated on your system, perhaps as part of another installation or a system update? That sometimes breaks this module (which is part of the perl core). You can try reinstalling that module from CPAN. Also if you run MAKER version 2.25 (latest version), you can run with -debug (i.e. 'maker -debug') to get more information just before the error occurs. You can then capture the error log send that to me. Thanks, Carson From: Sivaranjani Namasivayam Date: Tue, 17 Apr 2012 16:46:40 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER2.23 output Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 11:09:32 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 13:09:32 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } } return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 14:25:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 16:25:51 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: Sorry missed the ';' at the end of the eval block. Should be this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; --Carson From: Carson Holt Date: Tue, 17 Apr 2012 13:09:32 -0400 To: Scott Geib , Subject: Re: [maker-devel] mpi issue on computing cluster If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elzedliu at gmail.com Tue Apr 17 17:22:53 2012 From: elzedliu at gmail.com (Huanle) Date: Tue, 17 Apr 2012 16:22:53 -0700 (PDT) Subject: [maker-devel] gene predictors in MARKER Message-ID: I am using MAKER to annotate a recently assembled plant genome. Hi There, I am using MAKER to annotate a recently assembled plant genome. I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial The denovo gene predictors i included in the maker_exe.ctl file are #-----Ab-initio Gene Prediction Algorithms snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of augustus executable However, I am not sure whether they were really used. During the running, i could see repeatmasker, exonerate and wublast were called. But i did see any information popped up for those gene predictors. So i am wondering if they were actually used. Could you please let me know how to know if all or one of those gene predictors were called by marker? Kind Regards, Huanle From carsonhh at gmail.com Mon Apr 23 15:04:16 2012 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 23 Apr 2012 17:04:16 -0400 Subject: [maker-devel] gene predictors in MARKER In-Reply-To: Message-ID: The gene predictors have to be trained first, and when they are trained they produce an HMM file that can be supplied to MAKER. You can either use MAKER's protein2genome option or est2genome option to produce rough models to train with, or you can try one of the models that come prepackaged with those algorithms. SNAP models will be in --> /sw/maker/2.10/bin/../exe/snap/HMM Augustus --> run this to see species in augustus --> /sw/maker/2.10/bin/../exe/augustus/bin/augustus --species=help GeneMark is self training. Run it one directly on your genome fasta or for speed just a chromosome or two of the assembly and it will produce a file called es.mod as part of it's results. That is the file you need. If you have any questions or issues with training just let us know. Thanks, Carson On 12-04-17 7:22 PM, "Huanle" wrote: >I am using MAKER to annotate a recently assembled plant genome. >Hi There, > >I am using MAKER to annotate a recently assembled plant genome. > >I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial > >The denovo gene predictors i included in the maker_exe.ctl file are >#-----Ab-initio Gene Prediction Algorithms >snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable >gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic >genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of >augustus executable > >However, I am not sure whether they were really used. > >During the running, i could see repeatmasker, exonerate and wublast >were called. But i did see any information popped up for those gene >predictors. > >So i am wondering if they were actually used. > >Could you please let me know how to know if all or one of those gene >predictors were called by marker? > >Kind Regards, >Huanle > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From anastasia.gioti at scilifelab.se Wed Apr 25 03:09:36 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:09:36 +0200 Subject: [maker-devel] Use pass-through system to add missing genes Message-ID: Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 03:22:03 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:22:03 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated > by MAKER using EST data from a closely related species and 3 ab initio > predictors (snap iterativelly trained 3 times, genemark trained directly > on the assembly and augustus with a model from a less closely related > species), along with a set of fungal proteins. I am missing ~ 1000 proteins > when I compare to the species i used EST data from, and there is good > evidence from alignments that these genes exist. The question is how to > proceed from Blast hits to actual gene models here. The idea would be to > add these genes to the existing dataset, rather than reannotate the genome. > I believe that reannotating it without any further evidence such as RNA-seq > from the species itself would not change much,and i d rather stick with > actual predictions that i trust and have used in subsequent analyses. The > 1000 genes I can accept to annotate with a less stringent and reliable way > than MAKER, I just want to add them so that the difference in gene count > gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the alignments > between the two species in the regions where genes were missed, but as i > said, I would not like to reannotate the whole genome, and running MAKER2 > might cause slight changes that i d like to avoid. Is this possible? First, > is it possible to provide a Gff3 file of specific locations and not the > entire genome alignment? (I guess so..) Second, how can I tag the existing > annotations as 'not to be changed' or alternatively, tag the new models > only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anastasia.gioti at scilifelab.se Wed Apr 25 03:29:30 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:29:30 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi, Do you mean that I should have not include the proteins of the closely related species in my fungal protein fasta file that I used as evidence in MAKER? i do not see why... What I have been trying to do now is further 'bias' the annotations in favor of this species, so as to get the missing genes. Can you explain a bit more whta you mean? Thanks, Anastasia On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 03:39:49 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:39:49 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: sorry my bad, i missed the part about you having already included the fungal proteins as fasta ;/ - too early for me. in that case have you viewed the full gff output for specific instances of such missing proteins in something like apollo to try and work out why maker hasn't made a call at those loci (aed score...)? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > Do you mean that I should have not include the proteins of the closely > related species in my fungal protein fasta file that I used as evidence in > MAKER? i do not see why... What I have been trying to do now is further > 'bias' the annotations in favor of this species, so as to get the missing > genes. Can you explain a bit more whta you mean? > Thanks, > Anastasia > > On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > > For cross-species comparisons you might have be better off including the > actual peptide sequences of the other fungi too in the annotation run - I'd > be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated >> by MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly >> on the assembly and augustus with a model from a less closely related >> species), along with a set of fungal proteins. I am missing ~ 1000 proteins >> when I compare to the species i used EST data from, and there is good >> evidence from alignments that these genes exist. The question is how to >> proceed from Blast hits to actual gene models here. The idea would be to >> add these genes to the existing dataset, rather than reannotate the genome. >> I believe that reannotating it without any further evidence such as RNA-seq >> from the species itself would not change much,and i d rather stick with >> actual predictions that i trust and have used in subsequent analyses. The >> 1000 genes I can accept to annotate with a less stringent and reliable way >> than MAKER, I just want to add them so that the difference in gene count >> gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the >> legacy annotations scheme to do it, by providing GFF3 of the alignments >> between the two species in the regions where genes were missed, but as i >> said, I would not like to reannotate the whole genome, and running MAKER2 >> might cause slight changes that i d like to avoid. Is this possible? First, >> is it possible to provide a Gff3 file of specific locations and not the >> entire genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models >> only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 25 08:29:01 2012 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Apr 2012 10:29:01 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? If that's the case just adding the new fasta file should do the trick. Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. Thanks, Carson From: Anastasia Gioti Date: Wed, 25 Apr 2012 11:09:36 +0200 To: Subject: [maker-devel] Use pass-through system to add missing genes Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3067 bytes Desc: not available URL: From anastasia.gioti at scilifelab.se Fri Apr 27 02:43:14 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Fri, 27 Apr 2012 10:43:14 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Hi Carlson, Thanks for your help! > The way you proceed depends on why the genes are not there to begin > with. Are they not there because of a lack of evidence? It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > If that's the case just adding the new fasta file should do the trick. which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Or are they not there because an assembly error makes it impossible > to get a logical model for the region (I.e reading frame breaks). This is not the case in general. > Are there ab initio models already called in those regions that > could just be promoted to the annotation tier? You can test that > one by blasting against the nonoverlaping_abinits.fasta files. I have not done this, will do! > > For any of the cases described, you can provide the existing > annotation set as the input in GFF3 format, and previous models will > be maintained preferentially. You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > If you know which ab initio predictions you want to add (I.e. the ab > initio promoting scenario I descibed), you can provide those > predictions to the use the pred_gff option and then set keep_preds=1 > and they will be maintained even without evidence. Attached is a > script that would make selecting those easier. It take the MAKER > generated GFF3 and a list of predictions to keep (one name per > line). These might be the results of a BLAST analysis for example. > It will then return the GFF3 entries for just those models selected. The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. > > If the situation is more complex, just provide more detail, and I am > sure we can help you come up with a plan. > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? Looks feasible though. Thanks again, Anastasia > Thanks, > Carson > > From: Anastasia Gioti > Date: Wed, 25 Apr 2012 11:09:36 +0200 > To: > Subject: [maker-devel] Use pass-through system to add missing genes > > Hi, > I have a set of predicted proteins from the genome of a fungus > annotated by MAKER using EST data from a closely related species > and 3 ab initio predictors (snap iterativelly trained 3 times, > genemark trained directly on the assembly and augustus with a model > from a less closely related species), along with a set of fungal > proteins. I am missing ~ 1000 proteins when I compare to the species > i used EST data from, and there is good evidence from alignments > that these genes exist. The question is how to proceed from Blast > hits to actual gene models here. The idea would be to add these > genes to the existing dataset, rather than reannotate the genome. I > believe that reannotating it without any further evidence such as > RNA-seq from the species itself would not change much,and i d rather > stick with actual predictions that i trust and have used in > subsequent analyses. The 1000 genes I can accept to annotate with a > less stringent and reliable way than MAKER, I just want to add them > so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the > alignments between the two species in the regions where genes were > missed, but as i said, I would not like to reannotate the whole > genome, and running MAKER2 might cause slight changes that i d like > to avoid. Is this possible? First, is it possible to provide a Gff3 > file of specific locations and not the entire genome alignment? (I > guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? How should > I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Apr 27 05:57:01 2012 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 27 Apr 2012 05:57:01 -0600 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> References: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Message-ID: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. > > The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 27 07:27:24 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 09:27:24 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Message-ID: > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. The only way you should be able to get BLASTX overlap and still not get a model for the region is if 1. The protein alignment in in a different reading frame then your models for every single base pair of the alignment (in which case it's not true overlap). 2. The BLASTX HSPs are stacked on each other again and again in weird rearranged overlaps to produce a very deep alignment which would mean this is a repetitive region and is not really a significant alignment. Otherwise this should not happen unless you have the AED_threshold set to some value where MAKER will ignore genes unless they have a minimum amount of support (by default this option is always off). The other two possibilities can be tested by just looking at the alignments manually in Apollo. Also take a look at the AED and eAED values for your missing genes. Anything below 1 should always be kept by MAKER by default because it has at least some evidence supported. > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. If they are already in your current run ignore this. Barry provided detailed instructions on how to configure MAKER, for your particular case. So just follow his excellent instructions. Thanks, Carson From: Barry Moore Date: Friday, 27 April, 2012 7:57 AM To: Anastasia Gioti Cc: Carson Holt , Subject: Re: [maker-devel] Use pass-through system to add missing genes Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. >> Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a >> logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be >> promoted to the annotation tier? You can test that one by blasting against >> the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set >> as the input in GFF3 format, and previous models will be maintained >> preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not > maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio >> promoting scenario I descibed), you can provide those predictions to the use >> the pred_gff option and then set keep_preds=1 and they will be maintained >> even without evidence. Attached is a script that would make selecting those >> easier. It take the MAKER generated GFF3 and a list of predictions to keep >> (one name per line). These might be the results of a BLAST analysis for >> example. It will then return the GFF3 entries for just those models >> selected. > > The thing is, for the few cases I have looked at, I cannot really decide which > model is the best, and the 3 models from the ab initio predictors do not agree > on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we >> can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by > exonerate) to the proteins of the closely related species that i am missing, > and somehow keep the previous annotations and get the extra ones by this gff > file. But how exactly maker should be run to do this I am not sure. if I want > to keep the previous annotations I need the gff file of the last maker run as > input, but then how do I discriminate with the exonerate gff file? And which > mode of rediction should be on, and with which parameters? You mention > keep_preds=1 for the existing annotations, but how do i also promote evidence > from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by >> MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly on >> the assembly and augustus with a model from a less closely related species), >> along with a set of fungal proteins. I am missing ~ 1000 proteins when I >> compare to the species i used EST data from, and there is good evidence from >> alignments that these genes exist. The question is how to proceed from Blast >> hits to actual gene models here. The idea would be to add these genes to the >> existing dataset, rather than reannotate the genome. I believe that >> reannotating it without any further evidence such as RNA-seq from the species >> itself would not change much,and i d rather stick with actual predictions >> that i trust and have used in subsequent analyses. The 1000 genes I can >> accept to annotate with a less stringent and reliable way than MAKER, I just >> want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy >> annotations scheme to do it, by providing GFF3 of the alignments between the >> two species in the regions where genes were missed, but as i said, I would >> not like to reannotate the whole genome, and running MAKER2 might cause >> slight changes that i d like to avoid. Is this possible? First, is it >> possible to provide a Gff3 file of specific locations and not the entire >> genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models only? >> How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.collett at pnnl.gov Fri Apr 27 10:51:05 2012 From: james.collett at pnnl.gov (Collett, James R) Date: Fri, 27 Apr 2012 09:51:05 -0700 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi Carson, Could you please send me (or make available for download) the perl script that you mentioned in this previous post in this thread? >> Attached is a >> script that would make selecting those easier. It take the MAKER >> generated GFF3 and a list of predictions to keep (one name per line). >> These might be the results of a BLAST analysis for example. It will >> then return the GFF3 entries for just those models selected. Thanks, Jim __________________________________________________ James R. Collett, Ph.D. Senior Scientist Chemical and Biological Process Development Group Energy and Environment Directorate Pacific Northwest National Laboratory > -----Original Message----- > From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- > bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- > lab.org > Sent: Friday, April 27, 2012 6:48 AM > To: maker-devel at yandell-lab.org > Subject: maker-devel Digest, Vol 47, Issue 14 > > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Use pass-through system to add missing genes (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 27 Apr 2012 09:27:24 -0400 > From: Carson Holt > To: Barry Moore , Anastasia Gioti > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > The only way you should be able to get BLASTX overlap and still not get > a model for the region is if 1. The protein alignment in in a > different reading frame then your models for every single base pair of > the alignment (in which case it's not true overlap). 2. The BLASTX > HSPs are stacked on each other again and again in weird rearranged > overlaps to produce a very deep alignment which would mean this is a > repetitive region and is not really a significant alignment. Otherwise > this should not happen unless you have the AED_threshold set to some > value where MAKER will ignore genes unless they have a minimum amount > of support (by default this option is always off). The other two > possibilities can be tested by just looking at the alignments manually > in Apollo. Also take a look at the AED and eAED values for your > missing genes. Anything below 1 should always be kept by MAKER by > default because it has at least some evidence supported. > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > If they are already in your current run ignore this. > > Barry provided detailed instructions on how to configure MAKER, for > your particular case. So just follow his excellent instructions. > > Thanks, > Carson > > > > From: Barry Moore > Date: Friday, 27 April, 2012 7:57 AM > To: Anastasia Gioti > Cc: Carson Holt , > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > > Hi Anastasia, > > On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > > > Hi Carlson, > > Thanks for your help! > > > >> The way you proceed depends on why the genes are not there to begin > with. > >> Are they not there because of a lack of evidence? > > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > > > This doesn't sound right. If there are predicted models and blastx > protein evidence overlapping them you should get a model retained. I > know for the EST evidence that it has to support a splice site before > it will be promoted and I can't remember if protein evidence is the > same but certainly if you pass back those protein2genome predictions > and the original proteins as evidence then they will be retained as > models. > > >> If that's the case just adding the new fasta file should do the > trick. > > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > > > Yes using the protein fasta from the closely related species as > evidence. I think you said you've already done that right? > > > >> Or are they not there because an assembly error makes it impossible > >> to get a logical model for the region (I.e reading frame breaks). > > > > This is not the case in general. > > > >> Are there ab initio models already called in those regions that > could > >> just be promoted to the annotation tier? You can test that one by > >> blasting against the nonoverlaping_abinits.fasta files. > > > > I have not done this, will do! > > > >> > >> For any of the cases described, you can provide the existing > >> annotation set as the input in GFF3 format, and previous models will > >> be maintained preferentially. > > > > You mean in a new maker run? is this possible with the old maker as > > well, not maker2, right? > > > > Yes, the original MAKER will do this. > > > >> If you know which ab initio predictions you want to add (I.e. the ab > >> initio promoting scenario I descibed), you can provide those > >> predictions to the use the pred_gff option and then set keep_preds=1 > >> and they will be maintained even without evidence. Attached is a > >> script that would make selecting those easier. It take the MAKER > >> generated GFF3 and a list of predictions to keep (one name per > line). > >> These might be the results of a BLAST analysis for example. It will > >> then return the GFF3 entries for just those models selected. > > > > The thing is, for the few cases I have looked at, I cannot really > > decide which model is the best, and the 3 models from the ab initio > > predictors do not agree on the exact intron-exon junctions or the > start and stop codons. > >> > >> If the situation is more complex, just provide more detail, and I am > >> sure we can help you come up with a plan. > >> > > What i was thinking to do was to provide a gff file of alignments (eg > > by > > exonerate) to the proteins of the closely related species that i am > > missing, and somehow keep the previous annotations and get the extra > > ones by this gff file. But how exactly maker should be run to do this > > I am not sure. if I want to keep the previous annotations I need the > > gff file of the last maker run as input, but then how do I > > discriminate with the exonerate gff file? And which mode of rediction > > should be on, and with which parameters? You mention > > keep_preds=1 for the existing annotations, but how do i also promote > > evidence from alignments on the same way in the same run? > > Looks feasible though. Thanks again, > > Anastasia > > > > Let me just restate what you've said so that I can be sure that I am > correct about what you've already done. You have run Maker with SNAP, > Genemark and Augustus using EST from a closely related species (passed > to altest) and protein evidence from other fungi. You are missing > about 1,000 genes compared to the species that provided the EST > alignments. You say their is good evidence that these genes exist from > the alignments and I assume by this that you mean the EST/protein > alignments that Maker produced. > > 1) Is the closely related fungus annotated and if so have you included > it's proteins in the evidence set that you provided to Maker. If you > haven't provided these proteins as evidence to maker then you should do > this. You can re-run maker passing your original models back through > like this: > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=1 > other_pass=1 > > #-----Protein Homology Evidence (for best results provide a file for at > least one) protein=proteins_from_closely_related.fasta > ## OR it sounds like you've already aligned these with exonerate? > protein_gff=proteins_from_closely_related_already_aligned.gff > > 2) If you've already included those closely related species proteins > but still didn't get the 1,000 genes, then take your > nonoverlaping_abinits.fasta and blast them directly against your > closely related proteins. Presumably they don't hit too well because > if they did they should have been promoted to predictions by Maker the > first time, but here you can decide yourself what thresholds to allow > to keep the abinit predictions that hit the closely related species > proteins. If you filter you blast hits the way you want and keep the > names of the abinit predictions that pass your filter, then use the > script Carson attached it it will generate a abinit precidtion GFF file > with only the predictions you selected. You can then pass those > predictions back to Maker and force it to keep them and Maker will turn > them from predictions > (match/match_part) into gene models. > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=0 > other_pass=1 > > #-----Gene Prediction > snaphmm= > gmhmm= > augustus_species= > fgenesh_par_file= > pred_gff=ab_init_predictions_rescued_by_blast.gff > > keep_preds=1 > > Barry > > >> Thanks, > >> Carson > >> > >> From: Anastasia Gioti > >> Date: Wed, 25 Apr 2012 11:09:36 +0200 > >> To: > >> Subject: [maker-devel] Use pass-through system to add missing genes > >> > >> Hi, > >> I have a set of predicted proteins from the genome of a fungus > >> annotated by MAKER using EST data from a closely related species > and > >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark > >> trained directly on the assembly and augustus with a model from a > >> less closely related species), along with a set of fungal proteins. > I > >> am missing ~ 1000 proteins when I compare to the species i used EST > >> data from, and there is good evidence from alignments that these > >> genes exist. The question is how to proceed from Blast hits to > actual > >> gene models here. The idea would be to add these genes to the > >> existing dataset, rather than reannotate the genome. I believe that > >> reannotating it without any further evidence such as RNA-seq from > the > >> species itself would not change much,and i d rather stick with > actual > >> predictions that i trust and have used in subsequent analyses. The > >> 1000 genes I can accept to annotate with a less stringent and > reliable way than MAKER, I just want to add them so that the difference > in gene count gets corrected. > >> I was reading the MAKER 2 paper and i was wondering if I can use the > >> legacy annotations scheme to do it, by providing GFF3 of the > >> alignments between the two species in the regions where genes were > >> missed, but as i said, I would not like to reannotate the whole > >> genome, and running MAKER2 might cause slight changes that i d like > >> to avoid. Is this possible? First, is it possible to provide a Gff3 > >> file of specific locations and not the entire genome alignment? (I > >> guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? > >> How should I run maker2, with which predictors on and which off? > >> Thanks, > >> Anastasia > >> > >> Anastasia Gioti > >> Post-doctoral Researcher > >> > >> anastasia.gioti at scilifelab.se > >> anastasia.gioti at ebc.uu.se > >> > >> > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia > >> / > >> > >> > >> > >> _______________________________________________ maker-devel mailing > >> list > >> maker- > devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis > >> tinfo/ma > >> ker-devel_yandell-lab.org > >> > > > > Anastasia Gioti > > Post-doctoral Researcher > > > > anastasia.gioti at scilifelab.se > > anastasia.gioti at ebc.uu.se > > > > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.or > > g > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: lab.org/attachments/20120427/72b70d49/attachment.html> > > ------------------------------ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > End of maker-devel Digest, Vol 47, Issue 14 > ******************************************* From carsonhh at gmail.com Fri Apr 27 11:18:23 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 13:18:23 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: Here you go. This will also be part of the next MAKER release in some form. Thanks, Carson On 12-04-27 12:51 PM, "Collett, James R" wrote: >Hi Carson, > >Could you please send me (or make available for download) the perl script >that you mentioned in this previous post in this thread? > >>> Attached is a >>> script that would make selecting those easier. It take the MAKER >>> generated GFF3 and a list of predictions to keep (one name per line). >>> These might be the results of a BLAST analysis for example. It will >>> then return the GFF3 entries for just those models selected. > >Thanks, > >Jim >__________________________________________________ >James R. Collett, Ph.D. >Senior Scientist >Chemical and Biological Process Development Group >Energy and Environment Directorate >Pacific Northwest National Laboratory > >> -----Original Message----- >> From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- >> bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- >> lab.org >> Sent: Friday, April 27, 2012 6:48 AM >> To: maker-devel at yandell-lab.org >> Subject: maker-devel Digest, Vol 47, Issue 14 >> >> Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.org >> >> or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >> You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of maker-devel digest..." >> >> >> Today's Topics: >> >> 1. Re: Use pass-through system to add missing genes (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 27 Apr 2012 09:27:24 -0400 >> From: Carson Holt >> To: Barry Moore , Anastasia Gioti >> >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> Message-ID: >> Content-Type: text/plain; charset="us-ascii" >> >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> >> The only way you should be able to get BLASTX overlap and still not get >> a model for the region is if 1. The protein alignment in in a >> different reading frame then your models for every single base pair of >> the alignment (in which case it's not true overlap). 2. The BLASTX >> HSPs are stacked on each other again and again in weird rearranged >> overlaps to produce a very deep alignment which would mean this is a >> repetitive region and is not really a significant alignment. Otherwise >> this should not happen unless you have the AED_threshold set to some >> value where MAKER will ignore genes unless they have a minimum amount >> of support (by default this option is always off). The other two >> possibilities can be tested by just looking at the alignments manually >> in Apollo. Also take a look at the AED and eAED values for your >> missing genes. Anything below 1 should always be kept by MAKER by >> default because it has at least some evidence supported. >> >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> >> If they are already in your current run ignore this. >> >> Barry provided detailed instructions on how to configure MAKER, for >> your particular case. So just follow his excellent instructions. >> >> Thanks, >> Carson >> >> >> >> From: Barry Moore >> Date: Friday, 27 April, 2012 7:57 AM >> To: Anastasia Gioti >> Cc: Carson Holt , >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> >> Hi Anastasia, >> >> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: >> >> > Hi Carlson, >> > Thanks for your help! >> > >> >> The way you proceed depends on why the genes are not there to begin >> with. >> >> Are they not there because of a lack of evidence? >> > >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> > >> >> This doesn't sound right. If there are predicted models and blastx >> protein evidence overlapping them you should get a model retained. I >> know for the EST evidence that it has to support a splice site before >> it will be promoted and I can't remember if protein evidence is the >> same but certainly if you pass back those protein2genome predictions >> and the original proteins as evidence then they will be retained as >> models. >> >> >> If that's the case just adding the new fasta file should do the >> trick. >> > >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> > >> >> Yes using the protein fasta from the closely related species as >> evidence. I think you said you've already done that right? >> >> >> >> Or are they not there because an assembly error makes it impossible >> >> to get a logical model for the region (I.e reading frame breaks). >> > >> > This is not the case in general. >> > >> >> Are there ab initio models already called in those regions that >> could >> >> just be promoted to the annotation tier? You can test that one by >> >> blasting against the nonoverlaping_abinits.fasta files. >> > >> > I have not done this, will do! >> > >> >> >> >> For any of the cases described, you can provide the existing >> >> annotation set as the input in GFF3 format, and previous models will >> >> be maintained preferentially. >> > >> > You mean in a new maker run? is this possible with the old maker as >> > well, not maker2, right? >> > >> >> Yes, the original MAKER will do this. >> >> >> >> If you know which ab initio predictions you want to add (I.e. the ab >> >> initio promoting scenario I descibed), you can provide those >> >> predictions to the use the pred_gff option and then set keep_preds=1 >> >> and they will be maintained even without evidence. Attached is a >> >> script that would make selecting those easier. It take the MAKER >> >> generated GFF3 and a list of predictions to keep (one name per >> line). >> >> These might be the results of a BLAST analysis for example. It will >> >> then return the GFF3 entries for just those models selected. >> > >> > The thing is, for the few cases I have looked at, I cannot really >> > decide which model is the best, and the 3 models from the ab initio >> > predictors do not agree on the exact intron-exon junctions or the >> start and stop codons. >> >> >> >> If the situation is more complex, just provide more detail, and I am >> >> sure we can help you come up with a plan. >> >> >> > What i was thinking to do was to provide a gff file of alignments (eg >> > by >> > exonerate) to the proteins of the closely related species that i am >> > missing, and somehow keep the previous annotations and get the extra >> > ones by this gff file. But how exactly maker should be run to do this >> > I am not sure. if I want to keep the previous annotations I need the >> > gff file of the last maker run as input, but then how do I >> > discriminate with the exonerate gff file? And which mode of rediction >> > should be on, and with which parameters? You mention >> > keep_preds=1 for the existing annotations, but how do i also promote >> > evidence from alignments on the same way in the same run? >> > Looks feasible though. Thanks again, >> > Anastasia >> > >> >> Let me just restate what you've said so that I can be sure that I am >> correct about what you've already done. You have run Maker with SNAP, >> Genemark and Augustus using EST from a closely related species (passed >> to altest) and protein evidence from other fungi. You are missing >> about 1,000 genes compared to the species that provided the EST >> alignments. You say their is good evidence that these genes exist from >> the alignments and I assume by this that you mean the EST/protein >> alignments that Maker produced. >> >> 1) Is the closely related fungus annotated and if so have you included >> it's proteins in the evidence set that you provided to Maker. If you >> haven't provided these proteins as evidence to maker then you should do >> this. You can re-run maker passing your original models back through >> like this: >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=1 >> other_pass=1 >> >> #-----Protein Homology Evidence (for best results provide a file for at >> least one) protein=proteins_from_closely_related.fasta >> ## OR it sounds like you've already aligned these with exonerate? >> protein_gff=proteins_from_closely_related_already_aligned.gff >> >> 2) If you've already included those closely related species proteins >> but still didn't get the 1,000 genes, then take your >> nonoverlaping_abinits.fasta and blast them directly against your >> closely related proteins. Presumably they don't hit too well because >> if they did they should have been promoted to predictions by Maker the >> first time, but here you can decide yourself what thresholds to allow >> to keep the abinit predictions that hit the closely related species >> proteins. If you filter you blast hits the way you want and keep the >> names of the abinit predictions that pass your filter, then use the >> script Carson attached it it will generate a abinit precidtion GFF file >> with only the predictions you selected. You can then pass those >> predictions back to Maker and force it to keep them and Maker will turn >> them from predictions >> (match/match_part) into gene models. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=0 >> other_pass=1 >> >> #-----Gene Prediction >> snaphmm= >> gmhmm= >> augustus_species= >> fgenesh_par_file= >> pred_gff=ab_init_predictions_rescued_by_blast.gff >> >> keep_preds=1 >> >> Barry >> >> >> Thanks, >> >> Carson >> >> >> >> From: Anastasia Gioti >> >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> >> To: >> >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> >> >> Hi, >> >> I have a set of predicted proteins from the genome of a fungus >> >> annotated by MAKER using EST data from a closely related species >> and >> >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark >> >> trained directly on the assembly and augustus with a model from a >> >> less closely related species), along with a set of fungal proteins. >> I >> >> am missing ~ 1000 proteins when I compare to the species i used EST >> >> data from, and there is good evidence from alignments that these >> >> genes exist. The question is how to proceed from Blast hits to >> actual >> >> gene models here. The idea would be to add these genes to the >> >> existing dataset, rather than reannotate the genome. I believe that >> >> reannotating it without any further evidence such as RNA-seq from >> the >> >> species itself would not change much,and i d rather stick with >> actual >> >> predictions that i trust and have used in subsequent analyses. The >> >> 1000 genes I can accept to annotate with a less stringent and >> reliable way than MAKER, I just want to add them so that the difference >> in gene count gets corrected. >> >> I was reading the MAKER 2 paper and i was wondering if I can use the >> >> legacy annotations scheme to do it, by providing GFF3 of the >> >> alignments between the two species in the regions where genes were >> >> missed, but as i said, I would not like to reannotate the whole >> >> genome, and running MAKER2 might cause slight changes that i d like >> >> to avoid. Is this possible? First, is it possible to provide a Gff3 >> >> file of specific locations and not the entire genome alignment? (I >> >> guess so..) Second, how can I tag the existing annotations as 'not >> to be changed' or alternatively, tag the new models only? >> >> How should I run maker2, with which predictors on and which off? >> >> Thanks, >> >> Anastasia >> >> >> >> Anastasia Gioti >> >> Post-doctoral Researcher >> >> >> >> anastasia.gioti at scilifelab.se >> >> anastasia.gioti at ebc.uu.se >> >> >> >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia >> >> / >> >> >> >> >> >> >> >> _______________________________________________ maker-devel mailing >> >> list >> >> maker- >> devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis >> >> tinfo/ma >> >> ker-devel_yandell-lab.org >> >> >> > >> > Anastasia Gioti >> > Post-doctoral Researcher >> > >> > anastasia.gioti at scilifelab.se >> > anastasia.gioti at ebc.uu.se >> > >> > >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> > >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.or >> > g >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > lab.org/attachments/20120427/72b70d49/attachment.html> >> >> ------------------------------ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> End of maker-devel Digest, Vol 47, Issue 14 >> ******************************************* > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3067 bytes Desc: not available URL: From weckalba at asu.edu Tue Apr 3 17:28:28 2012 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 3 Apr 2012 16:28:28 -0700 Subject: [maker-devel] gff3_preds2models usage question Message-ID: Hello maker developers and users, I am attempting to use the gff3_preds2models scripts, but running into a few issues. Initially, I hit errors that seemed to be fixed by installing CGI and its dependancies. However, that during that installation a few tests did fail. I can provide error logs if that would be helpful, however, I went on to install and attempt gff3_preds2models anyway. What I am currently doing is running gff3_merge first, to gather the maker outputs. I am doing so with both the -n option on and off. When providing the gff3 file with the sequence I get the following error from gff3_preds2models: Undefined subroutine &maker::auto_annotator::annotate called at /Users/Walter/Bioinformatics/Tools/maker/bin/gff3_preds2models line 97, line 992291. This seemed to be the same error as that of what someone else saw on these boards, but I did not see a later email resolving the issue. I also tried giving it just the gff3 without the sequences at the bottom of the file and then I get this error: ERROR: There was a problem in the writing the fasta entry Either no sequence was given, or there was an error in writing This leads me to believe I should be using the one with the sequence, but I am not certain of that. I see it might be possible to go from maker outputs to chado database then to gene->mRNA->exon gff3s, but I have not set up my machine for XML or chado yet, and it does not appear trivial. Thanks for the help, Walter -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Apr 3 20:24:49 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Wed, 4 Apr 2012 02:24:49 +0000 Subject: [maker-devel] mRNA-seq data Message-ID: Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 3 20:39:02 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 03 Apr 2012 22:39:02 -0400 Subject: [maker-devel] mRNA-seq data In-Reply-To: Message-ID: Yes. If you have them in fasta format, just provide them to the est= option and let MAEKR align them with exonerate. If you used something like cufflinks or trinity, to process them you can provide them to the est_gff option (MAKER comes with a cufflinks2gff3 converter to make that easy). Thanks, Carson From: Sivaranjani Namasivayam Date: Wed, 4 Apr 2012 02:24:49 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] mRNA-seq data Hi, I am using to MAKER to annotate a genome and I would like a couple of clarifications. In the previous version of MAKER, under EST_evidence in maker_opts. ctl the user could input est and est_reads- the mRNAseq reads (although this was not fully implemented). The latest version of MAKER uses mRNA-seq data to improve annotation quality. I have assembled transcriptome data from Sanger,454 and Illumina Do I just provide all this data in a fasta file format to the 'est' option? Is this is the best way to provide the mRNA-seq evidence?Will this assure the mRNA-seq data is used to improve the annotations? Thanks! Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pingouinandsheep at gmail.com Thu Apr 5 08:14:39 2012 From: pingouinandsheep at gmail.com (pingouinandsheep at gmail.com) Date: Thu, 5 Apr 2012 07:14:39 -0700 (PDT) Subject: [maker-devel] Huge memory usage Message-ID: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Hello, When I try to run the test provided with maker2, maker start to use a huge amount of memory. I stoped it after it reach ~100go of memory used. I believe the test should not use that amount of memory. In an other message someone suggest that the bioperl version installed could be the cause of the problem, but the bioperl installed on my cluster is already at version 1.6. perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' 1.006901 Unfortunately I don't have an error message to provide, that could clarify my problem. But maybe it is a recurrent problem and you know a few things I should check. Thanks, Ismael From carsonhh at gmail.com Thu Apr 5 08:26:17 2012 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 05 Apr 2012 10:26:17 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: <5338ad1d-dc04-4150-b5ee-a88da7c42549@h5g2000vbx.googlegroups.com> Message-ID: The test should not use up more then a few megabytes of RAM. Even on very large datasets you should never really use more that 1 or 2 gig of RAM perl MAKER instance It's possible that their may be other perl modules that are broken need to be reinstalled on your system. This can happen when perl gets updated, but you are pointing to modules built for a different perl version with the PERL5LIB environmental variable. Make sure you you have the latest version of MAKER and run with --debug set. Collect that output and send it to me (the --debug option does some dependancy checking). I know there is an issue on Macs with updating perl's DB_File module that causes it to gobble up big sections of the hard drive (it will eventually fill the drive if you let it). It's not a memory issue but just one example of how broken modules can cause weird behavior. Thanks, Carson On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" wrote: >Hello, > >When I try to run the test provided with maker2, maker start to use a >huge amount of memory. I stoped it after it reach ~100go of memory >used. I believe the test should not use that amount of memory. > >In an other message someone suggest that the bioperl version installed >could be the cause of the problem, but the bioperl installed on my >cluster is already at version 1.6. > >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >1.006901 > >Unfortunately I don't have an error message to provide, that could >clarify my problem. > >But maybe it is a recurrent problem and you know a few things I should >check. > >Thanks, > >Ismael > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eernst at cshl.edu Sun Apr 8 16:09:22 2012 From: eernst at cshl.edu (Evan Ernst) Date: Sun, 8 Apr 2012 18:09:22 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Message-ID: Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 10 08:26:40 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 Apr 2012 10:26:40 -0400 Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI In-Reply-To: Message-ID: Depending on if your using NFS and other architecture design you can get race conditions with the datastore log file. This primarily happens when you have multiple instances of MAKER running at the same time or thousands of short contigs running in parallel so many finish at the same time. In a future release, I plan on having the last MAKER job to exit just rebuild the log at the end of a run to ensure it is complete. For now though, just run 'maker -dsindex' at the end of a run when it happens. It will rebbuild the log and only takes a few seconds. Thanks, Carson From: Evan Ernst Date: Sun, 8 Apr 2012 18:09:22 -0400 To: Subject: [maker-devel] Incomplete/Missing lines in datastore index log under openMPI Hi Carson, It looks like there may be a locking issue with the datastore index log in MAKER 2.25/openmpi 1.4.5. I noticed this when running 8 MPI maker instances, each with 32 nodes. Examples from the log: scaffold1001.1 genome_datastore/93/A6/scaffold1001.1/ FINISHED scaffold1002.1 genome_datastore/72/43/scaffold1002.1/ FINISHED scaffold1003.1 genome_datastore/B8/05/scaffold1003.1/ FINISHED ... scaffold10085.1 genome_datastore/1C/7E/scaffold10085.1/ FINISHED scaffold8265.1 genome_datastore/01/E4/scaffold8265.1/ FINISHED D scaffold8295.1 genome_datastore/63/13/scaffold8295.1/ FINISHED ... scaffold8351.1 genome_datastore/27/52/scaffold8351.1/ FINISHED scaffold8343.1 genome_datastore/BF/31/scaffold8343.1/ FINISHED scaffold10167.1 genome_datastore/0B/9A/scaffold10167.1/ FINISHEscaffold10170.1 genome_datastore/F4/FF/scaffold10170.1/ FINISHED scaffold10209.1 genome_datastore/2D/AA/scaffold10209.1/ FINISHEscaffold10072.1 genome_datastore/E0/A5/scaffold10072.1/ FINISHED scaffold10113.1 genome_datastore/00/23/scaffold10113.1/ FINISHED I see this even when running a single MPI instance, 32 nodes, when no actual processing is required apart from marking the scaffolds FINISHED. Comparing the result to a single, non-MPI maker instance running on the same completed hierarchy reveals that many entries aren't being written to the log at all when running under MPI. The single process instance runs just fine, generating a complete log that can be used for the downstream scripts. Between runs, I execute a find genome.maker.output/ -name .NFSLock* -type f -print0 | xargs -0 rm & to be sure lingering lock files from badly exiting processes weren't interfering. This looks like the sort of thing that may be difficult to track down, and there's a clear workaround, but I'm happy to provide more information if you'd like to debug it. Thanks, Evan _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From smg283 at gmail.com Fri Apr 13 13:00:29 2012 From: smg283 at gmail.com (Scott Geib) Date: Fri, 13 Apr 2012 09:00:29 -1000 Subject: [maker-devel] mpi issue on computing cluster Message-ID: Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 14:12:24 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 20:12:24 +0000 Subject: [maker-devel] Functional annotation pipeline Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane From joseph.fass at gmail.com Fri Apr 13 14:42:51 2012 From: joseph.fass at gmail.com (Joseph Fass) Date: Fri, 13 Apr 2012 13:42:51 -0700 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> References: <61D01ACB70C1E141A150BA9F586D5BFA065AD9@EXCHANGE-05.internal.solazyme.com> Message-ID: Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 13:51:38 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 15:51:38 -0400 Subject: [maker-devel] Huge memory usage In-Reply-To: Message-ID: You can pre-mask the genome, convert the RepaetMasker results to GFF3 and pass them in, or just run the ./configure script in the RepeatMasker directory to configure wublast to be the default. You can also let MAKER install it's own separate installation of RepeatMasker using rmblast. Just go to the maker/src/ directory and run this command --> ./Build repeatmasker MAKER will use that installation preferentially if you let it install that. Thanks, Carson From: padioleau isma?l Date: Fri, 13 Apr 2012 17:42:20 +0200 To: Carson Holt Subject: Re: [maker-devel] Huge memory usage Dear Carson, I have a problem with RepeatMasker on my cluster. It work with wublast but not with Crossmatch. As maker try to run RepeatMasker with default I can not successfully run maker. I wanted to know if I can provide to maker the genome already masked (if I run with wublast externally), I though it was possible but I can't found in the configuration files where I should provide it i.e : In maker_opts.ctl, should I provide the result from RepeatMasker to 'genome_gff:' and set 'rm_pass' to 1, or set rm_gff in the 'Repeat Masking' part of the file? Or maybe I should provide directly the masked fasta as genome reference. An other solution could be to ask maker to run RepeatMasker with the option '-e wublast'. Is it possible to use one of these solutions? Thanks, Ismael 2012/4/5 padioleau isma?l > Dear Carson, > > Thank you for your very quick answering. > > I realised that I missed some error messages and the problem seems to be > linked to the DB_file package as you suggested. The person in charge of > installation told me that he will recover the configuration. > > I will test it after the Easter weekend and come back to you if we have other > issues. > > Have a nice Easter weekend, > > Ismael > > Here Is the error message: > Use of uninitialized value $DB_File::db_version in numeric ge (>=) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 276. > Use of uninitialized value $DB_File::db_version in numeric gt (>) at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 280. > Deep recursion on subroutine "DB_File::AUTOLOAD" at > /mnt/common/DevTools/install/Linux/x86_64/perl/perl-5.10.1/lib/5.10.1/x86_64-l > inux-thread-multi/DB_File.pm line 235. > > > > 2012/4/5 Carson Holt >> The test should not use up more then a few megabytes of RAM. Even on very >> large datasets you should never really use more that 1 or 2 gig of RAM >> perl MAKER instance >> >> It's possible that their may be other perl modules that are broken need to >> be reinstalled on your system. This can happen when perl gets updated, >> but you are pointing to modules built for a different perl version with >> the PERL5LIB environmental variable. Make sure you you have the latest >> version of MAKER and run with --debug set. Collect that output and send >> it to me (the --debug option does some dependancy checking). >> >> I know there is an issue on Macs with updating perl's DB_File module that >> causes it to gobble up big sections of the hard drive (it will eventually >> fill the drive if you let it). It's not a memory issue but just one >> example of how broken modules can cause weird behavior. >> >> Thanks, >> Carson >> >> >> >> >> On 12-04-05 10:14 AM, "pingouinandsheep at gmail.com" >> wrote: >> >>> >Hello, >>> > >>> >When I try to run the test provided with maker2, maker start to use a >>> >huge amount of memory. I stoped it after it reach ~100go of memory >>> >used. I believe the test should not use that amount of memory. >>> > >>> >In an other message someone suggest that the bioperl version installed >>> >could be the cause of the problem, but the bioperl installed on my >>> >cluster is already at version 1.6. >>> > >>> >perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' >>> >1.006901 >>> > >>> >Unfortunately I don't have an error message to provide, that could >>> >clarify my problem. >>> > >>> >But maybe it is a recurrent problem and you know a few things I should >>> >check. >>> > >>> >Thanks, >>> > >>> >Ismael >>> > >>> >_______________________________________________ >>> >maker-devel mailing list >>> >maker-devel at box290.bluehost.com >>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > > -- > Isma?l Padioleau > Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) > Emmanouil Dermitzakis Group > Dpt de M?decine G?n?tique et D?veloppement > Universit? de Gen?ve - Facult? de M?decine > CMU - Rue Michel-Servet 1 > CH 1211 Gen?ve 4 > Tel: 0041 22 379 59 74 > ismael.padioleau at unige.ch > > -- > Tel. 0041 78 77 69 561 > ismpadioleau at gmail.com -- Isma?l Padioleau Evgeny Zdobnov Group (Computational Evolutionary Genomics Group) Emmanouil Dermitzakis Group Dpt de M?decine G?n?tique et D?veloppement Universit? de Gen?ve - Facult? de M?decine CMU - Rue Michel-Servet 1 CH 1211 Gen?ve 4 Tel: 0041 22 379 59 74 ismael.padioleau at unige.ch -- Tel. 0041 78 77 69 561 ismpadioleau at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 13 15:02:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 Apr 2012 17:02:51 -0400 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: Message-ID: I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > Hi, can you recommend any open source functional annotation pipelines - to > assign function, GO terms, pathways, etc. to gene models? > > Thanks, > Shane > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sbrubaker at solazyme.com Fri Apr 13 15:18:10 2012 From: sbrubaker at solazyme.com (Shane Brubaker) Date: Fri, 13 Apr 2012 21:18:10 +0000 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: <61D01ACB70C1E141A150BA9F586D5BFA065BBA@EXCHANGE-05.internal.solazyme.com> Great thank you ... I will take a look at those. From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Friday, April 13, 2012 2:03 PM To: Joseph Fass; Shane Brubaker Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Functional annotation pipeline I would agree blast2go. You can also try interproscan fro the EBI MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help integrate interproscan results in the GFF3 files. There are also a couple of scripts maker_functional_gff and maker_functional_fasta that can do putative functional annotation using uniprot/swiss-prot. Thanks, Carson From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline Would http://blast2go.de/b2ghome be the kind of thing you're looking for? HTH, ~Joe On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker > wrote: Hi, can you recommend any open source functional annotation pipelines - to assign function, GO terms, pathways, etc. to gene models? Thanks, Shane _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Joseph Fass Lead Data Analyst UC Davis Bioinformatics Core joseph.fass -at- gmail.com (professional) 970.227.5928 (c) || 530.752.2698 (w) _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 15:22:37 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:22:37 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: Careful of interproscan atm.. I believe the executable is still in beta and they definitely aren't recommending it for production use yet. If you do use it be sure to check output file if using the lookup service as when i used it recently it would sometimes exit normally despite lookup failures (the lookup problems may have had something to do with running ~800 in parallel - they're looking into the issue atm.). dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Carson Holt > I would agree blast2go. > > You can also try interproscan fro the EBI > > MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help > integrate interproscan results in the GFF3 files. There are also a couple > of scripts maker_functional_gff and maker_functional_fasta that can do > putative functional annotation using uniprot/swiss-prot. > > Thanks, > Carson > > > > From: Joseph Fass > Date: Fri, 13 Apr 2012 13:42:51 -0700 > To: Shane Brubaker > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Functional annotation pipeline > > Would http://blast2go.de/b2ghome be the kind of thing you're looking for? > HTH, > ~Joe > > On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: > >> Hi, can you recommend any open source functional annotation pipelines - >> to assign function, GO terms, pathways, etc. to gene models? >> >> Thanks, >> Shane >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > > > -- > Joseph Fass > Lead Data Analyst > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Fri Apr 13 15:37:06 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Fri, 13 Apr 2012 22:37:06 +0100 Subject: [maker-devel] Functional annotation pipeline In-Reply-To: References: Message-ID: sorry, that's the new version of course. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/13 Daniel Hughes > Careful of interproscan atm.. I believe the executable is still in beta > and they definitely aren't recommending it for production use yet. If you > do use it be sure to check output file if using the lookup service as when > i used it recently it would sometimes exit normally despite lookup failures > (the lookup problems may have had something to do with running ~800 in > parallel - they're looking into the issue atm.). > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > > 2012/4/13 Carson Holt > >> I would agree blast2go. >> >> You can also try interproscan fro the EBI >> >> MAKER comes with two scripts ipr_update_gff and iprscan2gff3 that help >> integrate interproscan results in the GFF3 files. There are also a couple >> of scripts maker_functional_gff and maker_functional_fasta that can do >> putative functional annotation using uniprot/swiss-prot. >> >> Thanks, >> Carson >> >> >> >> From: Joseph Fass >> Date: Fri, 13 Apr 2012 13:42:51 -0700 >> To: Shane Brubaker >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Functional annotation pipeline >> >> Would http://blast2go.de/b2ghome be the kind of thing you're looking for? >> HTH, >> ~Joe >> >> On Fri, Apr 13, 2012 at 1:12 PM, Shane Brubaker wrote: >> >>> Hi, can you recommend any open source functional annotation pipelines - >>> to assign function, GO terms, pathways, etc. to gene models? >>> >>> Thanks, >>> Shane >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> >> >> -- >> Joseph Fass >> Lead Data Analyst >> UC Davis Bioinformatics Core >> joseph.fass -at- gmail.com (professional) >> 970.227.5928 (c) || 530.752.2698 (w) >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kd7gwt at exchange.usfood.com Sat Apr 14 00:50:28 2012 From: kd7gwt at exchange.usfood.com (Liz Douglas) Date: Sat, 14 Apr 2012 14:50:28 +0800 Subject: [maker-devel] Incredible effect on your possibilities in bed Message-ID: <002801cd1a0b$727fe840$5047c36a@SAMrc6umq> http://sten-stil.dk/require.html Do you wish to satisfy your babe tonight? From ranjani at uga.edu Tue Apr 17 10:46:40 2012 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 17 Apr 2012 16:46:40 +0000 Subject: [maker-devel] MAKER2.23 output Message-ID: Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 10:56:11 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 12:56:11 -0400 Subject: [maker-devel] MAKER2.23 output In-Reply-To: Message-ID: Segmentation fault means there was a failure with C code. It was likely in one of the modules being used. These are all potential culprits Inline::C Proc::ProcessTable DB_file forks Based on when the error occurred. I would lean more toward DB_File. Is it possible that BerkleyDB has been updated on your system, perhaps as part of another installation or a system update? That sometimes breaks this module (which is part of the perl core). You can try reinstalling that module from CPAN. Also if you run MAKER version 2.25 (latest version), you can run with -debug (i.e. 'maker -debug') to get more information just before the error occurs. You can then capture the error log send that to me. Thanks, Carson From: Sivaranjani Namasivayam Date: Tue, 17 Apr 2012 16:46:40 +0000 To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER2.23 output Hi, I tried running the latest version of Maker 2.23 with my dataset but with out much success. When I run it without the mpi option I exits with a segmentation fault STATUS: Processing and indexing input FASTA files... Segmentation fault So, I tried it with mpi, the run does start but I don't see any output files. I ran it with 20 cpus for close to 10 hrs I tested this with the sample data in maker's data folder. Input in maker_opts.ctl file genome=/usr/local/maker/2.23/data/dpp_contig.fasta est= /usr/local/maker/2.23/data/dpp_est.fasta protein= /usr/local/maker/2.23/data/dpp_protein.fasta est2genome=1 This was the command I executed usr/local/mpich2/1.4.1p1/gcc_4.5.3/bin/mpirun -np 2 /usr/local/maker/2.23/bin/maker maker_opts.ctl maker_bopts.ctl maker_exe.ctl The following folder with the protein sequence file gets created dpp_contig.maker.output But I can't see any progress after that. Can you please tell me if I might be doing something wrong or need further details I was able run the previous version of Maker 2.10 successfully with my dataset. Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 11:09:32 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 13:09:32 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } } return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 17 14:25:51 2012 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 17 Apr 2012 16:25:51 -0400 Subject: [maker-devel] mpi issue on computing cluster In-Reply-To: Message-ID: Sorry missed the ';' at the end of the eval block. Should be this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; --Carson From: Carson Holt Date: Tue, 17 Apr 2012 13:09:32 -0400 To: Scott Geib , Subject: Re: [maker-devel] mpi issue on computing cluster If it's a sharedlibs issue then 'maker -help' would cause the same error. Try that. Are you sure that you are not worried about Signal.pm causing the error? Try changing /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm lines 136-143 from this --> require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id return $p if ($p->pid == $id); } return undef; To this --> my $select; eval{ require Proc::ProcessTable; my $obj = new Proc::ProcessTable; foreach my $p (@{$obj->table}) { #now check for the id if ($p->pid == $id){ $select = $p; last; } } }; return $select; If it works, I can generate a cleaner workaround, but I'd like to know If that is the root of the problem. Thanks, Carson From: Scott Geib Date: Fri, 13 Apr 2012 09:00:29 -1000 To: Subject: [maker-devel] mpi issue on computing cluster Hi, I am trying to run maker 2.24 on a compute cluster and get the following error (not worried about Signal.pm error): an into unknown state (hex char: 29) at /mnt/work/scratch/scottge/maker-2.24/maker/bin/../lib/Proc/Signal.pm line 138. Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(388)........: MPID_Init(139)...............: channel initialization failed MPIDI_CH3_Init(49)...........: progress_init failed MPIDI_CH3I_Progress_init(808): This version of MPICH requires the SIGUSR1 signal, but the application has already installed a handler [proxy:0:0 at r01n11.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:0 at r01n11.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:0 at r01n11.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:1 at r01n13.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at r01n13.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:1 at r01n13.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [proxy:0:3 at r07n27.local] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:3 at r07n27.local] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status [proxy:0:3 at r07n27.local] main (./pm/pmiserv/pmip.c:208): demux engine error waiting for event [mpiexec at r01n11.local] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting [mpiexec at r01n11.local] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): launcher returned error waiting for completion [mpiexec at r01n11.local] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion [mpiexec at r01n11.local] main (./ui/mpich/mpiexec.c:404): process manager error waiting for completion I do not know how mpich2 was compiled, I feel this may be a --enable-sharedlibs issue? I may need to contact my cluster support, but I thought I would try here first, Thanks _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From elzedliu at gmail.com Tue Apr 17 17:22:53 2012 From: elzedliu at gmail.com (Huanle) Date: Tue, 17 Apr 2012 16:22:53 -0700 (PDT) Subject: [maker-devel] gene predictors in MARKER Message-ID: I am using MAKER to annotate a recently assembled plant genome. Hi There, I am using MAKER to annotate a recently assembled plant genome. I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial The denovo gene predictors i included in the maker_exe.ctl file are #-----Ab-initio Gene Prediction Algorithms snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic genemark executable gmhmmp= #location of prokaryotic genemark executable augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of augustus executable However, I am not sure whether they were really used. During the running, i could see repeatmasker, exonerate and wublast were called. But i did see any information popped up for those gene predictors. So i am wondering if they were actually used. Could you please let me know how to know if all or one of those gene predictors were called by marker? Kind Regards, Huanle From carsonhh at gmail.com Mon Apr 23 15:04:16 2012 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 23 Apr 2012 17:04:16 -0400 Subject: [maker-devel] gene predictors in MARKER In-Reply-To: Message-ID: The gene predictors have to be trained first, and when they are trained they produce an HMM file that can be supplied to MAKER. You can either use MAKER's protein2genome option or est2genome option to produce rough models to train with, or you can try one of the models that come prepackaged with those algorithms. SNAP models will be in --> /sw/maker/2.10/bin/../exe/snap/HMM Augustus --> run this to see species in augustus --> /sw/maker/2.10/bin/../exe/augustus/bin/augustus --species=help GeneMark is self training. Run it one directly on your genome fasta or for speed just a chromosome or two of the assembly and it will produce a file called es.mod as part of it's results. That is the file you need. If you have any questions or issues with training just let us know. Thanks, Carson On 12-04-17 7:22 PM, "Huanle" wrote: >I am using MAKER to annotate a recently assembled plant genome. >Hi There, > >I am using MAKER to annotate a recently assembled plant genome. > >I followed the tutorial here: http://gmod.org/wiki/MAKER_Tutorial > >The denovo gene predictors i included in the maker_exe.ctl file are >#-----Ab-initio Gene Prediction Algorithms >snap=/sw/maker/2.10/bin/../exe/snap/snap #location of snap executable >gmhmme3=/sw/GeneMark/20120203/bin/gmhmme3 #location of eukaryotic >genemark executable >gmhmmp= #location of prokaryotic genemark executable >augustus=/sw/maker/2.10/bin/../exe/augustus/bin/augustus #location of >augustus executable > >However, I am not sure whether they were really used. > >During the running, i could see repeatmasker, exonerate and wublast >were called. But i did see any information popped up for those gene >predictors. > >So i am wondering if they were actually used. > >Could you please let me know how to know if all or one of those gene >predictors were called by marker? > >Kind Regards, >Huanle > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From anastasia.gioti at scilifelab.se Wed Apr 25 03:09:36 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:09:36 +0200 Subject: [maker-devel] Use pass-through system to add missing genes Message-ID: Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 03:22:03 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:22:03 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated > by MAKER using EST data from a closely related species and 3 ab initio > predictors (snap iterativelly trained 3 times, genemark trained directly > on the assembly and augustus with a model from a less closely related > species), along with a set of fungal proteins. I am missing ~ 1000 proteins > when I compare to the species i used EST data from, and there is good > evidence from alignments that these genes exist. The question is how to > proceed from Blast hits to actual gene models here. The idea would be to > add these genes to the existing dataset, rather than reannotate the genome. > I believe that reannotating it without any further evidence such as RNA-seq > from the species itself would not change much,and i d rather stick with > actual predictions that i trust and have used in subsequent analyses. The > 1000 genes I can accept to annotate with a less stringent and reliable way > than MAKER, I just want to add them so that the difference in gene count > gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the alignments > between the two species in the regions where genes were missed, but as i > said, I would not like to reannotate the whole genome, and running MAKER2 > might cause slight changes that i d like to avoid. Is this possible? First, > is it possible to provide a Gff3 file of specific locations and not the > entire genome alignment? (I guess so..) Second, how can I tag the existing > annotations as 'not to be changed' or alternatively, tag the new models > only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anastasia.gioti at scilifelab.se Wed Apr 25 03:29:30 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Wed, 25 Apr 2012 11:29:30 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi, Do you mean that I should have not include the proteins of the closely related species in my fungal protein fasta file that I used as evidence in MAKER? i do not see why... What I have been trying to do now is further 'bias' the annotations in favor of this species, so as to get the missing genes. Can you explain a bit more whta you mean? Thanks, Anastasia On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > For cross-species comparisons you might have be better off including the actual peptide sequences of the other fungi too in the annotation run - I'd be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > Hi, > I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Wed Apr 25 03:39:49 2012 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Wed, 25 Apr 2012 10:39:49 +0100 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: sorry my bad, i missed the part about you having already included the fungal proteins as fasta ;/ - too early for me. in that case have you viewed the full gff output for specific instances of such missing proteins in something like apollo to try and work out why maker hasn't made a call at those loci (aed score...)? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2012/4/25 Anastasia Gioti > Hi, > Do you mean that I should have not include the proteins of the closely > related species in my fungal protein fasta file that I used as evidence in > MAKER? i do not see why... What I have been trying to do now is further > 'bias' the annotations in favor of this species, so as to get the missing > genes. Can you explain a bit more whta you mean? > Thanks, > Anastasia > > On Apr 25, 2012, at 11:22 AM, Daniel Hughes wrote: > > For cross-species comparisons you might have be better off including the > actual peptide sequences of the other fungi too in the annotation run - I'd > be very surprised if you really did get the same result. > > dan. > > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2012/4/25 Anastasia Gioti > >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated >> by MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly >> on the assembly and augustus with a model from a less closely related >> species), along with a set of fungal proteins. I am missing ~ 1000 proteins >> when I compare to the species i used EST data from, and there is good >> evidence from alignments that these genes exist. The question is how to >> proceed from Blast hits to actual gene models here. The idea would be to >> add these genes to the existing dataset, rather than reannotate the genome. >> I believe that reannotating it without any further evidence such as RNA-seq >> from the species itself would not change much,and i d rather stick with >> actual predictions that i trust and have used in subsequent analyses. The >> 1000 genes I can accept to annotate with a less stringent and reliable way >> than MAKER, I just want to add them so that the difference in gene count >> gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the >> legacy annotations scheme to do it, by providing GFF3 of the alignments >> between the two species in the regions where genes were missed, but as i >> said, I would not like to reannotate the whole genome, and running MAKER2 >> might cause slight changes that i d like to avoid. Is this possible? First, >> is it possible to provide a Gff3 file of specific locations and not the >> entire genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models >> only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Apr 25 08:29:01 2012 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 25 Apr 2012 10:29:01 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? If that's the case just adding the new fasta file should do the trick. Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. Thanks, Carson From: Anastasia Gioti Date: Wed, 25 Apr 2012 11:09:36 +0200 To: Subject: [maker-devel] Use pass-through system to add missing genes Hi, I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? Thanks, Anastasia Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3067 bytes Desc: not available URL: From anastasia.gioti at scilifelab.se Fri Apr 27 02:43:14 2012 From: anastasia.gioti at scilifelab.se (Anastasia Gioti) Date: Fri, 27 Apr 2012 10:43:14 +0200 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Hi Carlson, Thanks for your help! > The way you proceed depends on why the genes are not there to begin > with. Are they not there because of a lack of evidence? It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > If that's the case just adding the new fasta file should do the trick. which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Or are they not there because an assembly error makes it impossible > to get a logical model for the region (I.e reading frame breaks). This is not the case in general. > Are there ab initio models already called in those regions that > could just be promoted to the annotation tier? You can test that > one by blasting against the nonoverlaping_abinits.fasta files. I have not done this, will do! > > For any of the cases described, you can provide the existing > annotation set as the input in GFF3 format, and previous models will > be maintained preferentially. You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > If you know which ab initio predictions you want to add (I.e. the ab > initio promoting scenario I descibed), you can provide those > predictions to the use the pred_gff option and then set keep_preds=1 > and they will be maintained even without evidence. Attached is a > script that would make selecting those easier. It take the MAKER > generated GFF3 and a list of predictions to keep (one name per > line). These might be the results of a BLAST analysis for example. > It will then return the GFF3 entries for just those models selected. The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. > > If the situation is more complex, just provide more detail, and I am > sure we can help you come up with a plan. > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? Looks feasible though. Thanks again, Anastasia > Thanks, > Carson > > From: Anastasia Gioti > Date: Wed, 25 Apr 2012 11:09:36 +0200 > To: > Subject: [maker-devel] Use pass-through system to add missing genes > > Hi, > I have a set of predicted proteins from the genome of a fungus > annotated by MAKER using EST data from a closely related species > and 3 ab initio predictors (snap iterativelly trained 3 times, > genemark trained directly on the assembly and augustus with a model > from a less closely related species), along with a set of fungal > proteins. I am missing ~ 1000 proteins when I compare to the species > i used EST data from, and there is good evidence from alignments > that these genes exist. The question is how to proceed from Blast > hits to actual gene models here. The idea would be to add these > genes to the existing dataset, rather than reannotate the genome. I > believe that reannotating it without any further evidence such as > RNA-seq from the species itself would not change much,and i d rather > stick with actual predictions that i trust and have used in > subsequent analyses. The 1000 genes I can accept to annotate with a > less stringent and reliable way than MAKER, I just want to add them > so that the difference in gene count gets corrected. > I was reading the MAKER 2 paper and i was wondering if I can use the > legacy annotations scheme to do it, by providing GFF3 of the > alignments between the two species in the regions where genes were > missed, but as i said, I would not like to reannotate the whole > genome, and running MAKER2 might cause slight changes that i d like > to avoid. Is this possible? First, is it possible to provide a Gff3 > file of specific locations and not the entire genome alignment? (I > guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? How should > I run maker2, with which predictors on and which off? > Thanks, > Anastasia > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > Anastasia Gioti Post-doctoral Researcher anastasia.gioti at scilifelab.se anastasia.gioti at ebc.uu.se http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri Apr 27 05:57:01 2012 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 27 Apr 2012 05:57:01 -0600 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> References: <4FE7CD5B-FC1C-43E7-AC41-A05823348B99@scilifelab.se> Message-ID: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. There are cases where all 3 used ab initio predictors provide models, there are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus no model is retained. i guess my default parameters could be responsible for these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be promoted to the annotation tier? You can test that one by blasting against the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set as the input in GFF3 format, and previous models will be maintained preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio promoting scenario I descibed), you can provide those predictions to the use the pred_gff option and then set keep_preds=1 and they will be maintained even without evidence. Attached is a script that would make selecting those easier. It take the MAKER generated GFF3 and a list of predictions to keep (one name per line). These might be the results of a BLAST analysis for example. It will then return the GFF3 entries for just those models selected. > > The thing is, for the few cases I have looked at, I cannot really decide which model is the best, and the 3 models from the ab initio predictors do not agree on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by exonerate) to the proteins of the closely related species that i am missing, and somehow keep the previous annotations and get the extra ones by this gff file. But how exactly maker should be run to do this I am not sure. if I want to keep the previous annotations I need the gff file of the last maker run as input, but then how do I discriminate with the exonerate gff file? And which mode of rediction should be on, and with which parameters? You mention keep_preds=1 for the existing annotations, but how do i also promote evidence from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by MAKER using EST data from a closely related species and 3 ab initio predictors (snap iterativelly trained 3 times, genemark trained directly on the assembly and augustus with a model from a less closely related species), along with a set of fungal proteins. I am missing ~ 1000 proteins when I compare to the species i used EST data from, and there is good evidence from alignments that these genes exist. The question is how to proceed from Blast hits to actual gene models here. The idea would be to add these genes to the existing dataset, rather than reannotate the genome. I believe that reannotating it without any further evidence such as RNA-seq from the species itself would not change much,and i d rather stick with actual predictions that i trust and have used in subsequent analyses. The 1000 genes I can accept to annotate with a less stringent and reliable way than MAKER, I just want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy annotations scheme to do it, by providing GFF3 of the alignments between the two species in the regions where genes were missed, but as i said, I would not like to reannotate the whole genome, and running MAKER2 might cause slight changes that i d like to avoid. Is this possible? First, is it possible to provide a Gff3 file of specific locations and not the entire genome alignment? (I guess so..) Second, how can I tag the existing annotations as 'not to be changed' or alternatively, tag the new models only? How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Apr 27 07:27:24 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 09:27:24 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: <03439C8F-75B0-42FE-894C-CC564AEB73E9@genetics.utah.edu> Message-ID: > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. The only way you should be able to get BLASTX overlap and still not get a model for the region is if 1. The protein alignment in in a different reading frame then your models for every single base pair of the alignment (in which case it's not true overlap). 2. The BLASTX HSPs are stacked on each other again and again in weird rearranged overlaps to produce a very deep alignment which would mean this is a repetitive region and is not really a significant alignment. Otherwise this should not happen unless you have the AED_threshold set to some value where MAKER will ignore genes unless they have a minimum amount of support (by default this option is always off). The other two possibilities can be tested by just looking at the alignments manually in Apollo. Also take a look at the AED and eAED values for your missing genes. Anything below 1 should always be kept by MAKER by default because it has at least some evidence supported. > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. If they are already in your current run ignore this. Barry provided detailed instructions on how to configure MAKER, for your particular case. So just follow his excellent instructions. Thanks, Carson From: Barry Moore Date: Friday, 27 April, 2012 7:57 AM To: Anastasia Gioti Cc: Carson Holt , Subject: Re: [maker-devel] Use pass-through system to add missing genes Hi Anastasia, On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > Hi Carlson, > Thanks for your help! > >> The way you proceed depends on why the genes are not there to begin with. >> Are they not there because of a lack of evidence? > > It is a mixture of cases, and I can only look at some examples to say that. > There are cases where all 3 used ab initio predictors provide models, there > are blastx hits, or both blastx and protein2 genome, but no EST evidence, thus > no model is retained. i guess my default parameters could be responsible for > these cases at least. > This doesn't sound right. If there are predicted models and blastx protein evidence overlapping them you should get a model retained. I know for the EST evidence that it has to support a splice site before it will be promoted and I can't remember if protein evidence is the same but certainly if you pass back those protein2genome predictions and the original proteins as evidence then they will be retained as models. >> If that's the case just adding the new fasta file should do the trick. > > which fasta do you refer to? The proteins file I use as evidence contains all > proteins i can actually use. > Yes using the protein fasta from the closely related species as evidence. I think you said you've already done that right? >> Or are they not there because an assembly error makes it impossible to get a >> logical model for the region (I.e reading frame breaks). > > This is not the case in general. > >> Are there ab initio models already called in those regions that could just be >> promoted to the annotation tier? You can test that one by blasting against >> the nonoverlaping_abinits.fasta files. > > I have not done this, will do! > >> >> For any of the cases described, you can provide the existing annotation set >> as the input in GFF3 format, and previous models will be maintained >> preferentially. > > You mean in a new maker run? is this possible with the old maker as well, not > maker2, right? > Yes, the original MAKER will do this. >> If you know which ab initio predictions you want to add (I.e. the ab initio >> promoting scenario I descibed), you can provide those predictions to the use >> the pred_gff option and then set keep_preds=1 and they will be maintained >> even without evidence. Attached is a script that would make selecting those >> easier. It take the MAKER generated GFF3 and a list of predictions to keep >> (one name per line). These might be the results of a BLAST analysis for >> example. It will then return the GFF3 entries for just those models >> selected. > > The thing is, for the few cases I have looked at, I cannot really decide which > model is the best, and the 3 models from the ab initio predictors do not agree > on the exact intron-exon junctions or the start and stop codons. >> >> If the situation is more complex, just provide more detail, and I am sure we >> can help you come up with a plan. >> > What i was thinking to do was to provide a gff file of alignments (eg by > exonerate) to the proteins of the closely related species that i am missing, > and somehow keep the previous annotations and get the extra ones by this gff > file. But how exactly maker should be run to do this I am not sure. if I want > to keep the previous annotations I need the gff file of the last maker run as > input, but then how do I discriminate with the exonerate gff file? And which > mode of rediction should be on, and with which parameters? You mention > keep_preds=1 for the existing annotations, but how do i also promote evidence > from alignments on the same way in the same run? > Looks feasible though. Thanks again, > Anastasia > Let me just restate what you've said so that I can be sure that I am correct about what you've already done. You have run Maker with SNAP, Genemark and Augustus using EST from a closely related species (passed to altest) and protein evidence from other fungi. You are missing about 1,000 genes compared to the species that provided the EST alignments. You say their is good evidence that these genes exist from the alignments and I assume by this that you mean the EST/protein alignments that Maker produced. 1) Is the closely related fungus annotated and if so have you included it's proteins in the evidence set that you provided to Maker. If you haven't provided these proteins as evidence to maker then you should do this. You can re-run maker passing your original models back through like this: #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=1 other_pass=1 #-----Protein Homology Evidence (for best results provide a file for at least one) protein=proteins_from_closely_related.fasta ## OR it sounds like you've already aligned these with exonerate? protein_gff=proteins_from_closely_related_already_aligned.gff 2) If you've already included those closely related species proteins but still didn't get the 1,000 genes, then take your nonoverlaping_abinits.fasta and blast them directly against your closely related proteins. Presumably they don't hit too well because if they did they should have been promoted to predictions by Maker the first time, but here you can decide yourself what thresholds to allow to keep the abinit predictions that hit the closely related species proteins. If you filter you blast hits the way you want and keep the names of the abinit predictions that pass your filter, then use the script Carson attached it it will generate a abinit precidtion GFF file with only the predictions you selected. You can then pass those predictions back to Maker and force it to keep them and Maker will turn them from predictions (match/match_part) into gene models. #-----Re-annotation Using MAKER Derived GFF3 genome_gff=original_maker_annotations.gff3 est_pass=1 altest_pass=1 protein_pass=1 rm_pass=1 model_pass=1 pred_pass=0 other_pass=1 #-----Gene Prediction snaphmm= gmhmm= augustus_species= fgenesh_par_file= pred_gff=ab_init_predictions_rescued_by_blast.gff keep_preds=1 Barry >> Thanks, >> Carson >> >> From: Anastasia Gioti >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> To: >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> Hi, >> I have a set of predicted proteins from the genome of a fungus annotated by >> MAKER using EST data from a closely related species and 3 ab initio >> predictors (snap iterativelly trained 3 times, genemark trained directly on >> the assembly and augustus with a model from a less closely related species), >> along with a set of fungal proteins. I am missing ~ 1000 proteins when I >> compare to the species i used EST data from, and there is good evidence from >> alignments that these genes exist. The question is how to proceed from Blast >> hits to actual gene models here. The idea would be to add these genes to the >> existing dataset, rather than reannotate the genome. I believe that >> reannotating it without any further evidence such as RNA-seq from the species >> itself would not change much,and i d rather stick with actual predictions >> that i trust and have used in subsequent analyses. The 1000 genes I can >> accept to annotate with a less stringent and reliable way than MAKER, I just >> want to add them so that the difference in gene count gets corrected. >> I was reading the MAKER 2 paper and i was wondering if I can use the legacy >> annotations scheme to do it, by providing GFF3 of the alignments between the >> two species in the regions where genes were missed, but as i said, I would >> not like to reannotate the whole genome, and running MAKER2 might cause >> slight changes that i d like to avoid. Is this possible? First, is it >> possible to provide a Gff3 file of specific locations and not the entire >> genome alignment? (I guess so..) Second, how can I tag the existing >> annotations as 'not to be changed' or alternatively, tag the new models only? >> How should I run maker2, with which predictors on and which off? >> Thanks, >> Anastasia >> >> Anastasia Gioti >> Post-doctoral Researcher >> >> anastasia.gioti at scilifelab.se >> anastasia.gioti at ebc.uu.se >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org >> > > Anastasia Gioti > Post-doctoral Researcher > > anastasia.gioti at scilifelab.se > anastasia.gioti at ebc.uu.se > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.collett at pnnl.gov Fri Apr 27 10:51:05 2012 From: james.collett at pnnl.gov (Collett, James R) Date: Fri, 27 Apr 2012 09:51:05 -0700 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: References: Message-ID: Hi Carson, Could you please send me (or make available for download) the perl script that you mentioned in this previous post in this thread? >> Attached is a >> script that would make selecting those easier. It take the MAKER >> generated GFF3 and a list of predictions to keep (one name per line). >> These might be the results of a BLAST analysis for example. It will >> then return the GFF3 entries for just those models selected. Thanks, Jim __________________________________________________ James R. Collett, Ph.D. Senior Scientist Chemical and Biological Process Development Group Energy and Environment Directorate Pacific Northwest National Laboratory > -----Original Message----- > From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- > bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- > lab.org > Sent: Friday, April 27, 2012 6:48 AM > To: maker-devel at yandell-lab.org > Subject: maker-devel Digest, Vol 47, Issue 14 > > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Use pass-through system to add missing genes (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 27 Apr 2012 09:27:24 -0400 > From: Carson Holt > To: Barry Moore , Anastasia Gioti > > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > The only way you should be able to get BLASTX overlap and still not get > a model for the region is if 1. The protein alignment in in a > different reading frame then your models for every single base pair of > the alignment (in which case it's not true overlap). 2. The BLASTX > HSPs are stacked on each other again and again in weird rearranged > overlaps to produce a very deep alignment which would mean this is a > repetitive region and is not really a significant alignment. Otherwise > this should not happen unless you have the AED_threshold set to some > value where MAKER will ignore genes unless they have a minimum amount > of support (by default this option is always off). The other two > possibilities can be tested by just looking at the alignments manually > in Apollo. Also take a look at the AED and eAED values for your > missing genes. Anything below 1 should always be kept by MAKER by > default because it has at least some evidence supported. > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > If they are already in your current run ignore this. > > Barry provided detailed instructions on how to configure MAKER, for > your particular case. So just follow his excellent instructions. > > Thanks, > Carson > > > > From: Barry Moore > Date: Friday, 27 April, 2012 7:57 AM > To: Anastasia Gioti > Cc: Carson Holt , > Subject: Re: [maker-devel] Use pass-through system to add missing > genes > > Hi Anastasia, > > On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: > > > Hi Carlson, > > Thanks for your help! > > > >> The way you proceed depends on why the genes are not there to begin > with. > >> Are they not there because of a lack of evidence? > > > > It is a mixture of cases, and I can only look at some examples to say > that. > > There are cases where all 3 used ab initio predictors provide models, > > there are blastx hits, or both blastx and protein2 genome, but no EST > > evidence, thus no model is retained. i guess my default parameters > > could be responsible for these cases at least. > > > > This doesn't sound right. If there are predicted models and blastx > protein evidence overlapping them you should get a model retained. I > know for the EST evidence that it has to support a splice site before > it will be promoted and I can't remember if protein evidence is the > same but certainly if you pass back those protein2genome predictions > and the original proteins as evidence then they will be retained as > models. > > >> If that's the case just adding the new fasta file should do the > trick. > > > > which fasta do you refer to? The proteins file I use as evidence > > contains all proteins i can actually use. > > > > Yes using the protein fasta from the closely related species as > evidence. I think you said you've already done that right? > > > >> Or are they not there because an assembly error makes it impossible > >> to get a logical model for the region (I.e reading frame breaks). > > > > This is not the case in general. > > > >> Are there ab initio models already called in those regions that > could > >> just be promoted to the annotation tier? You can test that one by > >> blasting against the nonoverlaping_abinits.fasta files. > > > > I have not done this, will do! > > > >> > >> For any of the cases described, you can provide the existing > >> annotation set as the input in GFF3 format, and previous models will > >> be maintained preferentially. > > > > You mean in a new maker run? is this possible with the old maker as > > well, not maker2, right? > > > > Yes, the original MAKER will do this. > > > >> If you know which ab initio predictions you want to add (I.e. the ab > >> initio promoting scenario I descibed), you can provide those > >> predictions to the use the pred_gff option and then set keep_preds=1 > >> and they will be maintained even without evidence. Attached is a > >> script that would make selecting those easier. It take the MAKER > >> generated GFF3 and a list of predictions to keep (one name per > line). > >> These might be the results of a BLAST analysis for example. It will > >> then return the GFF3 entries for just those models selected. > > > > The thing is, for the few cases I have looked at, I cannot really > > decide which model is the best, and the 3 models from the ab initio > > predictors do not agree on the exact intron-exon junctions or the > start and stop codons. > >> > >> If the situation is more complex, just provide more detail, and I am > >> sure we can help you come up with a plan. > >> > > What i was thinking to do was to provide a gff file of alignments (eg > > by > > exonerate) to the proteins of the closely related species that i am > > missing, and somehow keep the previous annotations and get the extra > > ones by this gff file. But how exactly maker should be run to do this > > I am not sure. if I want to keep the previous annotations I need the > > gff file of the last maker run as input, but then how do I > > discriminate with the exonerate gff file? And which mode of rediction > > should be on, and with which parameters? You mention > > keep_preds=1 for the existing annotations, but how do i also promote > > evidence from alignments on the same way in the same run? > > Looks feasible though. Thanks again, > > Anastasia > > > > Let me just restate what you've said so that I can be sure that I am > correct about what you've already done. You have run Maker with SNAP, > Genemark and Augustus using EST from a closely related species (passed > to altest) and protein evidence from other fungi. You are missing > about 1,000 genes compared to the species that provided the EST > alignments. You say their is good evidence that these genes exist from > the alignments and I assume by this that you mean the EST/protein > alignments that Maker produced. > > 1) Is the closely related fungus annotated and if so have you included > it's proteins in the evidence set that you provided to Maker. If you > haven't provided these proteins as evidence to maker then you should do > this. You can re-run maker passing your original models back through > like this: > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=1 > other_pass=1 > > #-----Protein Homology Evidence (for best results provide a file for at > least one) protein=proteins_from_closely_related.fasta > ## OR it sounds like you've already aligned these with exonerate? > protein_gff=proteins_from_closely_related_already_aligned.gff > > 2) If you've already included those closely related species proteins > but still didn't get the 1,000 genes, then take your > nonoverlaping_abinits.fasta and blast them directly against your > closely related proteins. Presumably they don't hit too well because > if they did they should have been promoted to predictions by Maker the > first time, but here you can decide yourself what thresholds to allow > to keep the abinit predictions that hit the closely related species > proteins. If you filter you blast hits the way you want and keep the > names of the abinit predictions that pass your filter, then use the > script Carson attached it it will generate a abinit precidtion GFF file > with only the predictions you selected. You can then pass those > predictions back to Maker and force it to keep them and Maker will turn > them from predictions > (match/match_part) into gene models. > > #-----Re-annotation Using MAKER Derived GFF3 > genome_gff=original_maker_annotations.gff3 > est_pass=1 > altest_pass=1 > protein_pass=1 > rm_pass=1 > model_pass=1 > pred_pass=0 > other_pass=1 > > #-----Gene Prediction > snaphmm= > gmhmm= > augustus_species= > fgenesh_par_file= > pred_gff=ab_init_predictions_rescued_by_blast.gff > > keep_preds=1 > > Barry > > >> Thanks, > >> Carson > >> > >> From: Anastasia Gioti > >> Date: Wed, 25 Apr 2012 11:09:36 +0200 > >> To: > >> Subject: [maker-devel] Use pass-through system to add missing genes > >> > >> Hi, > >> I have a set of predicted proteins from the genome of a fungus > >> annotated by MAKER using EST data from a closely related species > and > >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark > >> trained directly on the assembly and augustus with a model from a > >> less closely related species), along with a set of fungal proteins. > I > >> am missing ~ 1000 proteins when I compare to the species i used EST > >> data from, and there is good evidence from alignments that these > >> genes exist. The question is how to proceed from Blast hits to > actual > >> gene models here. The idea would be to add these genes to the > >> existing dataset, rather than reannotate the genome. I believe that > >> reannotating it without any further evidence such as RNA-seq from > the > >> species itself would not change much,and i d rather stick with > actual > >> predictions that i trust and have used in subsequent analyses. The > >> 1000 genes I can accept to annotate with a less stringent and > reliable way than MAKER, I just want to add them so that the difference > in gene count gets corrected. > >> I was reading the MAKER 2 paper and i was wondering if I can use the > >> legacy annotations scheme to do it, by providing GFF3 of the > >> alignments between the two species in the regions where genes were > >> missed, but as i said, I would not like to reannotate the whole > >> genome, and running MAKER2 might cause slight changes that i d like > >> to avoid. Is this possible? First, is it possible to provide a Gff3 > >> file of specific locations and not the entire genome alignment? (I > >> guess so..) Second, how can I tag the existing annotations as 'not > to be changed' or alternatively, tag the new models only? > >> How should I run maker2, with which predictors on and which off? > >> Thanks, > >> Anastasia > >> > >> Anastasia Gioti > >> Post-doctoral Researcher > >> > >> anastasia.gioti at scilifelab.se > >> anastasia.gioti at ebc.uu.se > >> > >> > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia > >> / > >> > >> > >> > >> _______________________________________________ maker-devel mailing > >> list > >> maker- > devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis > >> tinfo/ma > >> ker-devel_yandell-lab.org > >> > > > > Anastasia Gioti > > Post-doctoral Researcher > > > > anastasia.gioti at scilifelab.se > > anastasia.gioti at ebc.uu.se > > > > > http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.or > > g > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: lab.org/attachments/20120427/72b70d49/attachment.html> > > ------------------------------ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > End of maker-devel Digest, Vol 47, Issue 14 > ******************************************* From carsonhh at gmail.com Fri Apr 27 11:18:23 2012 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 Apr 2012 13:18:23 -0400 Subject: [maker-devel] Use pass-through system to add missing genes In-Reply-To: Message-ID: Here you go. This will also be part of the next MAKER release in some form. Thanks, Carson On 12-04-27 12:51 PM, "Collett, James R" wrote: >Hi Carson, > >Could you please send me (or make available for download) the perl script >that you mentioned in this previous post in this thread? > >>> Attached is a >>> script that would make selecting those easier. It take the MAKER >>> generated GFF3 and a list of predictions to keep (one name per line). >>> These might be the results of a BLAST analysis for example. It will >>> then return the GFF3 entries for just those models selected. > >Thanks, > >Jim >__________________________________________________ >James R. Collett, Ph.D. >Senior Scientist >Chemical and Biological Process Development Group >Energy and Environment Directorate >Pacific Northwest National Laboratory > >> -----Original Message----- >> From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel- >> bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell- >> lab.org >> Sent: Friday, April 27, 2012 6:48 AM >> To: maker-devel at yandell-lab.org >> Subject: maker-devel Digest, Vol 47, Issue 14 >> >> Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.org >> >> or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >> You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of maker-devel digest..." >> >> >> Today's Topics: >> >> 1. Re: Use pass-through system to add missing genes (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 27 Apr 2012 09:27:24 -0400 >> From: Carson Holt >> To: Barry Moore , Anastasia Gioti >> >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> Message-ID: >> Content-Type: text/plain; charset="us-ascii" >> >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> >> The only way you should be able to get BLASTX overlap and still not get >> a model for the region is if 1. The protein alignment in in a >> different reading frame then your models for every single base pair of >> the alignment (in which case it's not true overlap). 2. The BLASTX >> HSPs are stacked on each other again and again in weird rearranged >> overlaps to produce a very deep alignment which would mean this is a >> repetitive region and is not really a significant alignment. Otherwise >> this should not happen unless you have the AED_threshold set to some >> value where MAKER will ignore genes unless they have a minimum amount >> of support (by default this option is always off). The other two >> possibilities can be tested by just looking at the alignments manually >> in Apollo. Also take a look at the AED and eAED values for your >> missing genes. Anything below 1 should always be kept by MAKER by >> default because it has at least some evidence supported. >> >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> >> If they are already in your current run ignore this. >> >> Barry provided detailed instructions on how to configure MAKER, for >> your particular case. So just follow his excellent instructions. >> >> Thanks, >> Carson >> >> >> >> From: Barry Moore >> Date: Friday, 27 April, 2012 7:57 AM >> To: Anastasia Gioti >> Cc: Carson Holt , >> Subject: Re: [maker-devel] Use pass-through system to add missing >> genes >> >> Hi Anastasia, >> >> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote: >> >> > Hi Carlson, >> > Thanks for your help! >> > >> >> The way you proceed depends on why the genes are not there to begin >> with. >> >> Are they not there because of a lack of evidence? >> > >> > It is a mixture of cases, and I can only look at some examples to say >> that. >> > There are cases where all 3 used ab initio predictors provide models, >> > there are blastx hits, or both blastx and protein2 genome, but no EST >> > evidence, thus no model is retained. i guess my default parameters >> > could be responsible for these cases at least. >> > >> >> This doesn't sound right. If there are predicted models and blastx >> protein evidence overlapping them you should get a model retained. I >> know for the EST evidence that it has to support a splice site before >> it will be promoted and I can't remember if protein evidence is the >> same but certainly if you pass back those protein2genome predictions >> and the original proteins as evidence then they will be retained as >> models. >> >> >> If that's the case just adding the new fasta file should do the >> trick. >> > >> > which fasta do you refer to? The proteins file I use as evidence >> > contains all proteins i can actually use. >> > >> >> Yes using the protein fasta from the closely related species as >> evidence. I think you said you've already done that right? >> >> >> >> Or are they not there because an assembly error makes it impossible >> >> to get a logical model for the region (I.e reading frame breaks). >> > >> > This is not the case in general. >> > >> >> Are there ab initio models already called in those regions that >> could >> >> just be promoted to the annotation tier? You can test that one by >> >> blasting against the nonoverlaping_abinits.fasta files. >> > >> > I have not done this, will do! >> > >> >> >> >> For any of the cases described, you can provide the existing >> >> annotation set as the input in GFF3 format, and previous models will >> >> be maintained preferentially. >> > >> > You mean in a new maker run? is this possible with the old maker as >> > well, not maker2, right? >> > >> >> Yes, the original MAKER will do this. >> >> >> >> If you know which ab initio predictions you want to add (I.e. the ab >> >> initio promoting scenario I descibed), you can provide those >> >> predictions to the use the pred_gff option and then set keep_preds=1 >> >> and they will be maintained even without evidence. Attached is a >> >> script that would make selecting those easier. It take the MAKER >> >> generated GFF3 and a list of predictions to keep (one name per >> line). >> >> These might be the results of a BLAST analysis for example. It will >> >> then return the GFF3 entries for just those models selected. >> > >> > The thing is, for the few cases I have looked at, I cannot really >> > decide which model is the best, and the 3 models from the ab initio >> > predictors do not agree on the exact intron-exon junctions or the >> start and stop codons. >> >> >> >> If the situation is more complex, just provide more detail, and I am >> >> sure we can help you come up with a plan. >> >> >> > What i was thinking to do was to provide a gff file of alignments (eg >> > by >> > exonerate) to the proteins of the closely related species that i am >> > missing, and somehow keep the previous annotations and get the extra >> > ones by this gff file. But how exactly maker should be run to do this >> > I am not sure. if I want to keep the previous annotations I need the >> > gff file of the last maker run as input, but then how do I >> > discriminate with the exonerate gff file? And which mode of rediction >> > should be on, and with which parameters? You mention >> > keep_preds=1 for the existing annotations, but how do i also promote >> > evidence from alignments on the same way in the same run? >> > Looks feasible though. Thanks again, >> > Anastasia >> > >> >> Let me just restate what you've said so that I can be sure that I am >> correct about what you've already done. You have run Maker with SNAP, >> Genemark and Augustus using EST from a closely related species (passed >> to altest) and protein evidence from other fungi. You are missing >> about 1,000 genes compared to the species that provided the EST >> alignments. You say their is good evidence that these genes exist from >> the alignments and I assume by this that you mean the EST/protein >> alignments that Maker produced. >> >> 1) Is the closely related fungus annotated and if so have you included >> it's proteins in the evidence set that you provided to Maker. If you >> haven't provided these proteins as evidence to maker then you should do >> this. You can re-run maker passing your original models back through >> like this: >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=1 >> other_pass=1 >> >> #-----Protein Homology Evidence (for best results provide a file for at >> least one) protein=proteins_from_closely_related.fasta >> ## OR it sounds like you've already aligned these with exonerate? >> protein_gff=proteins_from_closely_related_already_aligned.gff >> >> 2) If you've already included those closely related species proteins >> but still didn't get the 1,000 genes, then take your >> nonoverlaping_abinits.fasta and blast them directly against your >> closely related proteins. Presumably they don't hit too well because >> if they did they should have been promoted to predictions by Maker the >> first time, but here you can decide yourself what thresholds to allow >> to keep the abinit predictions that hit the closely related species >> proteins. If you filter you blast hits the way you want and keep the >> names of the abinit predictions that pass your filter, then use the >> script Carson attached it it will generate a abinit precidtion GFF file >> with only the predictions you selected. You can then pass those >> predictions back to Maker and force it to keep them and Maker will turn >> them from predictions >> (match/match_part) into gene models. >> >> #-----Re-annotation Using MAKER Derived GFF3 >> genome_gff=original_maker_annotations.gff3 >> est_pass=1 >> altest_pass=1 >> protein_pass=1 >> rm_pass=1 >> model_pass=1 >> pred_pass=0 >> other_pass=1 >> >> #-----Gene Prediction >> snaphmm= >> gmhmm= >> augustus_species= >> fgenesh_par_file= >> pred_gff=ab_init_predictions_rescued_by_blast.gff >> >> keep_preds=1 >> >> Barry >> >> >> Thanks, >> >> Carson >> >> >> >> From: Anastasia Gioti >> >> Date: Wed, 25 Apr 2012 11:09:36 +0200 >> >> To: >> >> Subject: [maker-devel] Use pass-through system to add missing genes >> >> >> >> Hi, >> >> I have a set of predicted proteins from the genome of a fungus >> >> annotated by MAKER using EST data from a closely related species >> and >> >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark >> >> trained directly on the assembly and augustus with a model from a >> >> less closely related species), along with a set of fungal proteins. >> I >> >> am missing ~ 1000 proteins when I compare to the species i used EST >> >> data from, and there is good evidence from alignments that these >> >> genes exist. The question is how to proceed from Blast hits to >> actual >> >> gene models here. The idea would be to add these genes to the >> >> existing dataset, rather than reannotate the genome. I believe that >> >> reannotating it without any further evidence such as RNA-seq from >> the >> >> species itself would not change much,and i d rather stick with >> actual >> >> predictions that i trust and have used in subsequent analyses. The >> >> 1000 genes I can accept to annotate with a less stringent and >> reliable way than MAKER, I just want to add them so that the difference >> in gene count gets corrected. >> >> I was reading the MAKER 2 paper and i was wondering if I can use the >> >> legacy annotations scheme to do it, by providing GFF3 of the >> >> alignments between the two species in the regions where genes were >> >> missed, but as i said, I would not like to reannotate the whole >> >> genome, and running MAKER2 might cause slight changes that i d like >> >> to avoid. Is this possible? First, is it possible to provide a Gff3 >> >> file of specific locations and not the entire genome alignment? (I >> >> guess so..) Second, how can I tag the existing annotations as 'not >> to be changed' or alternatively, tag the new models only? >> >> How should I run maker2, with which predictors on and which off? >> >> Thanks, >> >> Anastasia >> >> >> >> Anastasia Gioti >> >> Post-doctoral Researcher >> >> >> >> anastasia.gioti at scilifelab.se >> >> anastasia.gioti at ebc.uu.se >> >> >> >> >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia >> >> / >> >> >> >> >> >> >> >> _______________________________________________ maker-devel mailing >> >> list >> >> maker- >> devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis >> >> tinfo/ma >> >> ker-devel_yandell-lab.org >> >> >> > >> > Anastasia Gioti >> > Post-doctoral Researcher >> > >> > anastasia.gioti at scilifelab.se >> > anastasia.gioti at ebc.uu.se >> > >> > >> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/ >> > >> > >> > >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >> lab.or >> > g >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > lab.org/attachments/20120427/72b70d49/attachment.html> >> >> ------------------------------ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> End of maker-devel Digest, Vol 47, Issue 14 >> ******************************************* > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3067 bytes Desc: not available URL: