From darasappan at gmail.com Mon Feb 3 10:31:16 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 3 Feb 2014 10:31:16 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand > what's happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? > Also, you had 29,000 transcripts from cufflinks, but fewer from > MAKER when you gave it the cufflinks data. How many transcripts did > MAKER identify with the cufflinks data? Did you still get more than > the 10,000 transcripts that you found with just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be > affecting it's performance is that it only annotates a gene where > there is both evidence (like your RNA-seq data) and an ab-initio > prediction. If a prediction is unsupported by the evidence, then > MAKER won't annotate a gene and if evidence aligns where there's no > prediction, MAKER won't annotate a gene either. What ab-initio > predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to > a gene model by setting the est2genome option to 1, but that will > usually give you many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data > and with est2genome set to 1, and let us know how that affects the > MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > dhivya arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rebzi87 at gmail.com Tue Feb 4 16:29:41 2014 From: rebzi87 at gmail.com (Rebecca Harris) Date: Tue, 4 Feb 2014 14:29:41 -0800 Subject: [maker-devel] maker output Message-ID: Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Feb 4 16:43:19 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 4 Feb 2014 16:43:19 -0600 Subject: [maker-devel] Fwd: maker annotation with cufflinks output References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Resending this since it didnt make it to the mailing list before. > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 4 16:42:52 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 4 Feb 2014 22:42:52 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished. If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com] Sent: Tuesday, February 04, 2014 3:29 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Tue Feb 4 16:49:46 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Tue, 4 Feb 2014 22:49:46 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: > 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" : > > Hi, > > I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. > These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though Mikael > > Thanks! > > Cheers, > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Feb 4 16:50:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Feb 2014 15:50:10 -0700 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Clusters are notoriously flakey, so maker is restartable (hence the need for the log file). Also since multiple nodes may write simultaneously to the log, they can munge it?s contents. You can rerun maker with the -dsindex flag to regenerate the master_datastore_index.log as well without processing anything else. You can even delete it before rebuilding it if you want to ensure all entries are uniq (run on a single cpus when you do this). Then count the number of FINISHED entries in the log. Thanks, Carson From: Rebecca Harris Date: Tuesday, February 4, 2014 at 3:29 PM To: Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 12:38:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 11:38:50 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn ap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand what's > happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? Also, you had > 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the > cufflinks data. How many transcripts did MAKER identify with the cufflinks > data? Did you still get more than the 10,000 transcripts that you found with > just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be affecting > it's performance is that it only annotates a gene where there is both evidence > (like your RNA-seq data) and an ab-initio prediction. If a prediction is > unsupported by the evidence, then MAKER won't annotate a gene and if evidence > aligns where there's no prediction, MAKER won't annotate a gene either. What > ab-initio predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to a gene > model by setting the est2genome option to 1, but that will usually give you > many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data and with > est2genome set to 1, and let us know how that affects the MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya > arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 13:28:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 19:28:48 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 14:13:57 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 14:13:57 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid > sequences from related organisms and some kind of omnibus source > like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 14:36:26 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 20:36:26 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, , <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Hi Dhivya, In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct. It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data. Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts. The problem that Carson mentioned with the SNAP HMM file is a real possibility also. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: dhivya arasappan [darasappan at gmail.com] Sent: Wednesday, February 05, 2014 1:13 PM To: Daniel Ence Cc: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 14:38:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 13:38:44 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Protein data doesn?t have to be from that closely a related species. This is because genes maintain homology at the amino acid level across even very large evolutionary distances. Having a closer related species just ensures that genome contents are similar (fewer losses/gains relative to each other). And use the entire proteome of at least one related species (just using a database like swiss-prot is not sufficient). Using translated mRNA-seq data will not give you any new information that was not already available from the untranslated sequence. Plus it will introduce the complicating artifacts that mRNA-seq generates into the protein part of the pipeline (gene merging, incorrect assembly, and false calls caused by background transcription). A big gotcha with mRNA-seq is that all of your genome gets transcribed at a low level, not just the genes, so you will always have contamination that does not represent real gene models. Also in the end you really only expect to capture about 50% of the genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will be partial). So using the proteins from another species, is important to improve sensitivity, and fix many of the issues that arise from the noisy nature of mRNA-seq. In fact if you were forced to use only one (either protein evidence or mRNA-seq) you will actually get better annotations from the protein evidence in most cases. You get better annotations when you use both, but if using only one of them, the proteins from another species are better, and noisy mRNA-seq will be the primary source of annotation error. Thanks, Carson From: dhivya arasappan Date: Wednesday, February 5, 2014 at 1:13 PM To: Daniel Ence Cc: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid sequences > from related organisms and some kind of omnibus source like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 23:16:43 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 23:16:43 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com> Thank you both for those explanations. I'll get back to you after I try rerunning maker. Dhivya On Feb 5, 2014, at 2:38 PM, Carson Holt wrote: > Protein data doesn?t have to be from that closely a related > species. This is because genes maintain homology at the amino acid > level across even very large evolutionary distances. Having a > closer related species just ensures that genome contents are similar > (fewer losses/gains relative to each other). And use the entire > proteome of at least one related species (just using a database like > swiss-prot is not sufficient). > > Using translated mRNA-seq data will not give you any new information > that was not already available from the untranslated sequence. Plus > it will introduce the complicating artifacts that mRNA-seq generates > into the protein part of the pipeline (gene merging, incorrect > assembly, and false calls caused by background transcription). A > big gotcha with mRNA-seq is that all of your genome gets transcribed > at a low level, not just the genes, so you will always have > contamination that does not represent real gene models. Also in the > end you really only expect to capture about 50% of the genes with > mRNA-seq (maybe 70% if you are fortunate - and most of those will be > partial). So using the proteins from another species, is important > to improve sensitivity, and fix many of the issues that arise from > the noisy nature of mRNA-seq. In fact if you were forced to use > only one (either protein evidence or mRNA-seq) you will actually get > better annotations from the protein evidence in most cases. You get > better annotations when you use both, but if using only one of them, > the proteins from another species are better, and noisy mRNA-seq > will be the primary source of annotation error. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Wednesday, February 5, 2014 at 1:13 PM > To: Daniel Ence > Cc: Carson Holt , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello Daniel and Carson, > > Thanks for your replies. > > Yes I used the the protein sequences resulting from annotation of > trinity assembly (using trinotate). I'll try using protein > sequences from related species (though there arent sequences from > closely related orgs). Could you tell me a little about why protein > data from annotating my rnaseq data would not work best here? > > Thanks > Dhivya > > On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > >> Hi Dhivya, Are the protein matches in your results coming from your >> annotations of the transcriptome? You should really use amino-acid >> sequences from related organisms and some kind of omnibus source >> like SwissProt. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> From: Carson Holt [carsonhh at gmail.com] >> Sent: Wednesday, February 05, 2014 11:38 AM >> To: dhivya arasappan; Daniel Ence >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Thu Feb 6 05:02:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Thu, 6 Feb 2014 11:02:37 +0000 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: Hi Carson, it?s nice to see all these new features in maker. I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message: ERROR: You found a tRNA with an intron! This should not happen --> rank=12, hostname=my-mgrid6 ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scf_013 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf_013 I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me: scf_013 1 189339 189410 Thr AGT 0 0 82.83 scf_013 2 510381 510462 Ser AGA 0 0 67.09 scf_013 3 586886 587000 Leu CAA 586924 586956 57.97 scf_013 4 942166 942069 Leu AAG 942128 942113 57.48 scf_013 5 169102 168993 Leu TAA 169065 169037 56.49 Hope this can be of some help while debugging. I?ll leave trnascan off for now. thanks, Mikael 10 jan 2014 kl. 22:03 skrev Carson Holt : > Hi Mikael, > > The options are part of the new MAKER-P integration > (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac > t). Additional documentation/tutorials will be forthcoming - probably in > a nice wiki page as part of the upcoming GMOD Malaysia courses in February > or alternatively with the annual GMOD summer school. The tRNA option is > easy enough to turn on (just set trna=1 in the maker_opts.ctl file). > > Thanks, > Carson > > > > On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and other maker developers, >> >> I was reading the source code of the latest maker release and noted >> several references to ncRNAs, snoscan and trnascan. Can these be >> incorporated into the normal annotation workflow? If so, are there any >> instructions available for that? >> >> best regards, >> Mikael Durling >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Thu Feb 6 08:52:12 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 08:52:12 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com> Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? Snippet of my genome.ann file is attached (since its too big for the list) for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.ann Type: application/octet-stream Size: 15761 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.dna Type: application/octet-stream Size: 3075 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 10:01:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:01:04 -0700 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: I?m making a new release this weekend, but if you have access to the devel version, you can test now. All changes have been committed tot he subversion repository. Thanks, Carson On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson, > >it?s nice to see all these new features in maker. > >I gave the trnascan option a try by enabling it in the config file for >one of my fungal genomes. It failed though, with this error message: > >ERROR: You found a tRNA with an intron! This should not happen >--> rank=12, hostname=my-mgrid6 >ERROR: Failed while gathering ab-init output files >ERROR: Chunk failed at level:1, tier_type:2 >FAILED CONTIG:scf_013 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf_013 > >I checked the trnascan output >(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, >and the output seems valid to me: > >scf_013 1 189339 189410 Thr AGT 0 0 >82.83 >scf_013 2 510381 510462 Ser AGA 0 0 >67.09 >scf_013 3 586886 587000 Leu CAA 586924 586956 >57.97 >scf_013 4 942166 942069 Leu AAG 942128 942113 >57.48 >scf_013 5 169102 168993 Leu TAA 169065 169037 >56.49 > > >Hope this can be of some help while debugging. I?ll leave trnascan off >for now. > >thanks, > >Mikael > > >10 jan 2014 kl. 22:03 skrev Carson Holt : > >> Hi Mikael, >> >> The options are part of the new MAKER-P integration >> >>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr >>ac >> t). Additional documentation/tutorials will be forthcoming - probably >>in >> a nice wiki page as part of the upcoming GMOD Malaysia courses in >>February >> or alternatively with the annual GMOD summer school. The tRNA option is >> easy enough to turn on (just set trna=1 in the maker_opts.ctl file). >> >> Thanks, >> Carson >> >> >> >> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and other maker developers, >>> >>> I was reading the source code of the latest maker release and noted >>> several references to ncRNAs, snoscan and trnascan. Can these be >>> incorporated into the normal annotation workflow? If so, are there any >>> instructions available for that? >>> >>> best regards, >>> Mikael Durling >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Thu Feb 6 10:05:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:05:05 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Your genome.dna file has no sequence? Did you by any chance strip the fasta sequence from the GFF3 you are using as input to maker2zff? There should be fasta sequence at the end of that file. Also can I see the GFF3 file you are using as input to maker2zff. Thanks, Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 7:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? My genome.ann file is attached for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 11:04:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 10:04:25 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: Could you give me the file without using 'head? to trim it, its cutting it before it reaches the part I?m interested in. ?Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 10:01 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip the fasta > sequence from the GFF3 you are using as input to maker2zff? There should be > fasta sequence at the end of that file. Also can I see the GFF3 file you are > using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data in it. > However, the SNAP steps after that have created empty files. The following > are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors like this > for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds > exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds > exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds > exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds > exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds > exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds > exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds > exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all showing up as > errors. I realize this may be an issue with snap, but are you familiar with > anything like this? My genome.ann file is attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? We?ve had >> a couple of recent posts where after training snap was giving no results, and >> as a result maker couldn?t give any genes. One cause of something like that >> may be your step 2. Make sure the ZFF wasn?t empty you used to train with. >> The maker2zff script uses filters to only put the best genes in the off file, >> and if all your genes fail the filtering then you are training with an empty >> ZFF. >> >> Also you should use proteins from a related species as your protein file. I >> see that you protein marches are varying wildly from run to run? So is your >> contig count? Were the subset of contigs you have results for long enough to >> contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to >> annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks like this >> (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of which there >> are 29,000 transcripts). I used the protein sequences from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks like >> this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than expected, so I >> went the cufflinks route. I dont understand why when using the cufflinks >> transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then >> used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna >> p/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the genome was >> more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand what's >>> happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >>> the cufflinks data. How many transcripts did MAKER identify with the >>> cufflinks data? Did you still get more than the 10,000 transcripts that you >>> found with just the Trinity data? >>> >>> A key part of MAKER's approach to genome annotation that might be affecting >>> it's performance is that it only annotates a gene where there is both >>> evidence (like your RNA-seq data) and an ab-initio prediction. If a >>> prediction is unsupported by the evidence, then MAKER won't annotate a gene >>> and if evidence aligns where there's no prediction, MAKER won't annotate a >>> gene either. What ab-initio predictors are you using and have they been >>> trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments to a gene >>> model by setting the est2genome option to 1, but that will usually give you >>> many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks data and with >>> est2genome set to 1, and let us know how that affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >>> arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Feb 6 11:01:44 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 11:01:44 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip > the fasta sequence from the GFF3 you are using as input to > maker2zff? There should be fasta sequence at the end of that file. > Also can I see the GFF3 file you are using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org > " > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data > in it. However, the SNAP steps after that have created empty files. > The following are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors > like this for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds > exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds > exon-2:out_of_bounds exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds > exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds > exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds > exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds > exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds > exon-20:out_of_bounds exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all > showing up as errors. I realize this may be an issue with snap, but > are you familiar with anything like this? My genome.ann file is > attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.cat.formatted.gff Type: application/octet-stream Size: 19905 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 6 18:22:57 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Feb 2014 16:22:57 -0800 Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation Message-ID: Hi MAKER developers, I?d like to add MAKER to Homebrew to make the installation of MAKER and its dependencies as straight forward as brew install maker. Homebrew is a system for installing software, originally developed for Mac OS, and now also for Linux through Linuxbrew. Homebrew/science is a collection of scientific software, which includes a lot of bioinformatics software. I?ve created a prototype for the MAKER installation script(called a formula, in Homebrew parlance). Is there a static URL for the source code of MAKER? The current formula won?t work out of the box, because part of the URLdepends on the user?s unique ID: http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz. Would you be interested in adding MAKER to Homebrew? I know MAKER must be licensed for commercial use. It is possible for Homebrew to display a notice of the MAKER license when it?s installed. MAKER is not available for commercial use without a license. Those wishing to license MAKER for commercial use should contact Beth Drees at the University of Utah TCO to discuss your needs. Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Feb 7 07:29:27 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 7 Feb 2014 08:29:27 -0500 Subject: [maker-devel] NCBI feature table Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Hello Maker Developers, I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. Cheers Ian From mike.thon at gmail.com Fri Feb 7 08:14:06 2014 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 7 Feb 2014 15:14:06 +0100 Subject: [maker-devel] NCBI feature table In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Hi Ian - We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format. However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits. From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess. The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app. In that case you'll need to convert the output of that app into .tbl format. > On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics wrote: > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cexzurjimenezjr at gmail.com Thu Feb 6 23:27:13 2014 From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.) Date: Fri, 7 Feb 2014 13:27:13 +0800 Subject: [maker-devel] Testing MAKER After Installation Message-ID: Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1501 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1319 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4540 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Fri Feb 7 12:11:44 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:11:44 +0000 Subject: [maker-devel] Maker installation In-Reply-To: References: Message-ID: Hi Tracy, The older apollo is pretty much deprecated. There are still people who like to use it though (myself among them). You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/. If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz You can also use Web-Apollo for your data if you want, and that is what I would recommend. On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that. The GUI is really only for demonstration purposes or very small datasets. It is not for production (that is why it is off by default). Thanks, Carson From: Tracy Smith > Date: Friday, February 7, 2014 at 10:48 AM To: Carson Holt > Cc: > Subject: Maker installation Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip" but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Fri Feb 7 12:28:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:28:29 +0000 Subject: [maker-devel] NCBI feature table In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Message-ID: Yes. The non-web version of apollo can open GFF3 and then save to table format ?> http://sourceforge.net/projects/gmod/files/Apollo/ I?ve also attached a script made by a lab member that can convert MAKER derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts author (Michael Campbell) incase you have any questions. Thanks, Carson On 2/7/14, 7:14 AM, "Michael Thon" wrote: >Hi Ian - > >We've been struggling with this too and I started developing a script to >convert the maker gff into ncbi's .tbl format. However we found that >some of the gene models required manual editing so what we do is import >the gff into a commercial application called Geneious where we do the >edits. From there we export the data in genbank format and then convert >it to .tbl format with a script. Our submission just passed the automated >checks and we're waiting for the manual review. Probably none of my code >will help you, and in any case its kind of a mess. The only advice I can >offer is to say that you'll probably need some manual editing in your >workflow, if not Apollo, then some other app. In that case you'll need >to convert the output of that app into .tbl format. > >> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics >> wrote: >> >> Hello Maker Developers, >> >> I have used this software with great success and I continue to look to >>it going forward. However, as I?m getting ready to submit my annotations >>to NCBI with the genomes I haven?t found a straightforward method of >>turning the MAKER produced GFF files into a NCBI feature table. What is >>the process for creating this table? It seem that the format NCBI is >>looking for is unique and I haven?t uncovered any scripts or tools to >>assist in the creation of this table from my annotation files. If anyone >>has any insight on this issue it would be greatly appreciated. >> >> Cheers >> Ian >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff32table Type: application/octet-stream Size: 7511 bytes Desc: gff32table URL: From carson.holt at genetics.utah.edu Fri Feb 7 12:31:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:31:17 +0000 Subject: [maker-devel] Testing MAKER After Installation In-Reply-To: References: Message-ID: That can happen on some systems with that very old version of MAKER. Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html Thanks, Carson From: "Cexzur Jimenez Jr." > Date: Thursday, February 6, 2014 at 10:27 PM To: > Subject: [maker-devel] Testing MAKER After Installation Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhall7 at hawaii.edu Fri Feb 7 18:31:36 2014 From: bhall7 at hawaii.edu (Brian Hall) Date: Fri, 07 Feb 2014 14:31:36 -1000 Subject: [maker-devel] NCBI feature table In-Reply-To: References: Message-ID: <52F57AE8.5090002@hawaii.edu> Hi Ian, My colleagues are also working on preparing a genome for submission to the NCBI. The software we are developing for this task is still a work in progress, but you are welcome to give it a try: https://github.com/tedsta/GAG It's a console-based application and it requires Python 2.6. Its strength is in filtering and modifying large segments of the genome at once -- where Apollo is good for removing a few erroneous exons, we are dealing with lists of dozens or more. This program seeks to make such changes as painless as possible. My advice is to try the simplest gff3-to-tbl script you can find and then run tbl2asn. If it works out okay, great! If you get a massive error report, get in touch and we'll help you out if we can :) --Brian On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote: > Date: Fri, 7 Feb 2014 08:29:27 -0500 > From: UMD Bioinformatics > To: maker-devel at yandell-lab.org > Subject: [maker-devel] NCBI feature table > Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com> > Content-Type: text/plain; charset=windows-1252 > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > From tmsmith23 at wisc.edu Fri Feb 7 11:48:13 2014 From: tmsmith23 at wisc.edu (Tracy Smith) Date: Fri, 7 Feb 2014 11:48:13 -0600 Subject: [maker-devel] Maker installation Message-ID: Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip " but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 09:34:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 08:34:58 -0700 Subject: [maker-devel] MAKER presentation at PAG In-Reply-To: References: Message-ID: * * maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts following the NCBI suggested naming format. * map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER fasta files. * map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3 files, old IDs/Names are mapped to to the Alias attribute. * maker_functional_fasta - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced transcript and protein fasta files. * maker_functional_gff - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced GFF3 files in the Note attribute * ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file. This is meta data that shows up when you click on an annotation in JBrowse /GBrowse. * iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3 features representing domains. Interesting tier for GBrowse. These are visible features tracks that can be seen in JBrowse/GBrowse. Thanks, Carson From: Kevin Dorn Date: Sunday, February 9, 2014 at 9:23 PM To: Subject: MAKER presentation at PAG Hi Carson, I saw your MAKER presentation at PAG this year and have a quick question. I've used MAKER to annotate the plant genome we're working on, and am mostly done. I had to step out for a second during your talk, and when I came back, you were talking about how you can transfer meaningful annotations (getting rid of the 'ugly MAKER names' for genes). Is there an accessory script to do this? Thanks, Kevin Dorn -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitha at ccmb.res.in Mon Feb 10 01:04:37 2014 From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR) Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST) Subject: [maker-devel] Falied to create new account In-Reply-To: Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Hi, I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. The error mentioned is: Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. at /var/www/cgi-bin/mwas/maker.cgi line 21. Kindly help me through the registration asap. Thanks Amitha. From listona at science.oregonstate.edu Sat Feb 8 20:08:42 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Sat, 08 Feb 2014 18:08:42 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron From caigh02 at gmail.com Sun Feb 9 21:26:57 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Sun, 9 Feb 2014 21:26:57 -0600 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 11:25:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:25:06 -0700 Subject: [maker-devel] Falied to create new account In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Message-ID: The smtp server that sends e-mails out is just down. So when you said you forgot your login, it couldn?t e-mail you. I switched to a different server for the time being. ?Carson On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" wrote: >Hi, > >I an interested in using Maker online version, for which i tried to >create a profile using the email id 'amitha at ccmb.res.in', but >unfortunately, I did not successfully login. >I am also pasting a link of the error here, >http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. > >The error mentioned is: >Error executing run mode 'forgot_login': Can't call method "MailMsg" >without a package or object reference at >/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. > at /var/www/cgi-bin/mwas/maker.cgi line 21. > >Kindly help me through the registration asap. > >Thanks >Amitha. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 10 11:26:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:26:06 -0700 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: 1. yes. It should take NCBI BLAST+ results. 2. It has to be UniProt/Swissprot or you can modify the comments of another database to look like UniProt/Swissport 3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature (or at least it could last time I tested it - which was quite a long time ago). Thanks, Carson From: Guohong Cai Date: Sunday, February 9, 2014 at 8:26 PM To: Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 13:21:31 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:21:31 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From listona at science.oregonstate.edu Mon Feb 10 13:46:06 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Mon, 10 Feb 2014 11:46:06 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron From: Barry Moore [mailto:barry.utah at gmail.com] Sent: Monday, February 10, 2014 11:22 AM To: Aaron Liston Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-using repeat masking in SNAP training Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 13:56:26 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:56:26 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com> Yep. If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run. B On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote: > Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron > > From: Barry Moore [mailto:barry.utah at gmail.com] > Sent: Monday, February 10, 2014 11:22 AM > To: Aaron Liston > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Re-using repeat masking in SNAP training > > Hi Arron, > > If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? > > Barry > > > On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > > > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 12:37:36 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 18:37:36 +0000 Subject: [maker-devel] Falied to create new account In-Reply-To: References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> , Message-ID: Hossein, Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 11:20 AM To: Daniel Ence Subject: Re: [maker-devel] Falied to create new account Hi Daniel I running it through the local server at my work M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-11 12:16 PM, "Daniel Ence" wrote: >Hi Hossein, > >Did you encounter this error while you were running MAKER on your local >machine or through the MAKER web annotation service? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Carson Holt [carsonhh at gmail.com] >Sent: Tuesday, February 11, 2014 10:18 AM >To: Daniel Ence >Cc: Mark Yandell >Subject: FW: [maker-devel] Falied to create new account > >Hey Daniel could you download his dataset, and see if you can replicate >the error. Also check if this was an MWAS job or a local maker run (his >dataset will already be there for MWAS, you just need the job ID). > >Thanks, >Carson > >On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: > >>Hi Carson >> >> >>I encountered this error while running maker >> >>FATAL ERROR >>ERROR: Failed while processing the chunk divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00006 >> >> >> >> >> >>HB >> >> >> >> >> >> >> >>> >> > > From darasappan at gmail.com Tue Feb 11 12:48:23 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 11 Feb 2014 12:48:23 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join > the GFF3?s without stripping out the fasta sequence at the end when > training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by > the mailing list. > > Here is the whole file: > > > cat.formatted.gff.tgz > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt > wrote: >> Could you give me the file without using 'head? to trim it, its >> cutting it before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org >> " >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file >> and used that as my input. I will rerun snap with the gff file >> containing the sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to >> maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip >>> the fasta sequence from the GFF3 you are using as input to >>> maker2zff? There should be fasta sequence at the end of that >>> file. Also can I see the GFF3 file you are using as input to >>> maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org >>> " >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has >>> data in it. However, the SNAP steps after that have created empty >>> files. The following are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get >>> errors like this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds >>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds >>> exon-2:out_of_bounds exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds >>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds >>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds >>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds >>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds >>> exon-20:out_of_bounds exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all >>> showing up as errors. I realize this may be an issue with snap, >>> but are you familiar with anything like this? My genome.ann file >>> is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step >>>> 3? We?ve had a couple of recent posts where after training snap >>>> was giving no results, and as a result maker couldn?t give any >>>> genes. One cause of something like that may be your step 2. >>>> Make sure the ZFF wasn?t empty you used to train with. The >>>> maker2zff script uses filters to only put the best genes in the >>>> off file, and if all your genes fail the filtering then you are >>>> training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your >>>> protein file. I see that you protein marches are varying wildly >>>> from run to run? So is your contig count? Were the subset of >>>> contigs you have results for long enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used >>>> trinotate to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of >>>> which there are 29,000 transcripts). I used the protein >>>> sequences from trinity assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than >>>> expected, so I went the cufflinks route. I dont understand why >>>> when using the cufflinks transcripts, even less genes are being >>>> found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train >>>> SNAP. I then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >>>> maker_mpi_withAlltrinity/snap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the >>>> genome was more than 90% complete. Any suggestions would be >>>> helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand >>>>> what's happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? >>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>>>> MAKER when you gave it the cufflinks data. How many transcripts >>>>> did MAKER identify with the cufflinks data? Did you still get >>>>> more than the 10,000 transcripts that you found with just the >>>>> Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might >>>>> be affecting it's performance is that it only annotates a gene >>>>> where there is both evidence (like your RNA-seq data) and an ab- >>>>> initio prediction. If a prediction is unsupported by the >>>>> evidence, then MAKER won't annotate a gene and if evidence >>>>> aligns where there's no prediction, MAKER won't annotate a gene >>>>> either. What ab-initio predictors are you using and have they >>>>> been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments >>>>> to a gene model by setting the est2genome option to 1, but that >>>>> will usually give you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks >>>>> data and with est2genome set to 1, and let us know how that >>>>> affects the MAKER results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on >>>>> behalf of dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a >>>>> very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran >>>>> maker >>>>> using my genome assembly and the trinity results. I did not get >>>>> as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome >>>>> assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>>>> using my >>>>> genome assembly and the cufflinks result. I get much less >>>>> number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel >>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 11 12:55:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Feb 2014 11:55:38 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> Message-ID: I wouldn?t use mpi_evaluator. It is buggy and has virtually no documentation. The AED values are the best way to identify which genes are higher and lower quality. You can also run interproscan to identify protein domain content as an independent evaluation. Look at this paper here ?> http://www.biomedcentral.com/1471-2105/12/491 Figure 4 has a nice example of how AED, domain content, and gene orthology correlate to show the quality of different subsets of genes in seven ant genomes. If you choose to try mpi_evaluator it uses the -CTL option to generate empty files that you then fill in. Thanks, Carson From: dhivya arasappan Date: Tuesday, February 11, 2014 at 11:48 AM To: Carson Holt Cc: Daniel Ence , Subject: Re: [maker-devel] maker annotation with cufflinks output With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join the GFF3?s > without stripping out the fasta sequence at the end when training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by the mailing > list. > > Here is the whole file: > > > cat.formatted.gff.tgz > b> > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt wrote: >> Could you give me the file without using 'head? to trim it, its cutting it >> before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >> >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file and used >> that as my input. I will rerun snap with the gff file containing the >> sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip the fasta >>> sequence from the GFF3 you are using as input to maker2zff? There should be >>> fasta sequence at the end of that file. Also can I see the GFF3 file you >>> are using as input to maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has data in it. >>> However, the SNAP steps after that have created empty files. The following >>> are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get errors like >>> this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds >>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds >>> exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds >>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds >>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds >>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds >>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds >>> exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up >>> as errors. I realize this may be an issue with snap, but are you familiar >>> with anything like this? My genome.ann file is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step 3? We?ve >>>> had a couple of recent posts where after training snap was giving no >>>> results, and as a result maker couldn?t give any genes. One cause of >>>> something like that may be your step 2. Make sure the ZFF wasn?t empty you >>>> used to train with. The maker2zff script uses filters to only put the best >>>> genes in the off file, and if all your genes fail the filtering then you >>>> are training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your protein file. >>>> I see that you protein marches are varying wildly from run to run? So is >>>> your contig count? Were the subset of contigs you have results for long >>>> enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate >>>> to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of which >>>> there are 29,000 transcripts). I used the protein sequences from trinity >>>> assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than expected, so I >>>> went the cufflinks route. I dont understand why when using the cufflinks >>>> transcripts, even less genes are being found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I >>>> then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s >>>> nap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the genome >>>> was more than 90% complete. Any suggestions would be helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand what's >>>>> happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave >>>>> it the cufflinks data. How many transcripts did MAKER identify with the >>>>> cufflinks data? Did you still get more than the 10,000 transcripts that >>>>> you found with just the Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might be >>>>> affecting it's performance is that it only annotates a gene where there is >>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a >>>>> prediction is unsupported by the evidence, then MAKER won't annotate a >>>>> gene and if evidence aligns where there's no prediction, MAKER won't >>>>> annotate a gene either. What ab-initio predictors are you using and have >>>>> they been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments to a gene >>>>> model by setting the est2genome option to 1, but that will usually give >>>>> you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks data and >>>>> with est2genome set to 1, and let us know how that affects the MAKER >>>>> results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>>>> using my genome assembly and the trinity results. I did not get as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>>>> genome assembly and the cufflinks result. I get much less number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 14:52:05 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 20:52:05 +0000 Subject: [maker-devel] New MAKER release Message-ID: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 15:19:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 21:19:17 +0000 Subject: [maker-devel] New MAKER release In-Reply-To: References: Message-ID: URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL. Thanks, Carson From: Joanna Kelley > Date: Tuesday, February 11, 2014 at 2:00 PM To: Carson Holt > Subject: Re: [maker-devel] New MAKER release Hi Carson, The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that? Thanks, Joanna Downloading RepBase... --2014-02-11 12:59:38-- http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz Resolving www.girinst.org... 66.201.49.247 Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2014-02-11 12:59:38 ERROR 404: Not Found. ERROR: Failed installing RepBase, now cleaning installation path... You may need to install RepBase manually. On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt > wrote: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Please update your address book, my new email address is joanna.l.kelley at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 16:59:57 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 22:59:57 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: Message-ID: Hi Hossen, I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 3:51 PM To: Daniel Ence Subject: ERROR: Failed while processing the chunk divide!! Dear Daniel I re-started maker and it is still running. But in error our file that has been generated so far it seems that smaller conitgs are affected. There are contigs of 2-4 kb with this error but also I noticed a contig of 30kb length having this error I was wondering if I need to change the setting in the maker_opt file #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) If I understand correctly max_dna_len divide conitgs of over 100kb to smaller chucks. However it is not clear to me that for the min_contig option if the default contig length is 10kb or less, then why I have error message for 30kb long contigs. Should I change this to 0 Here is an example of the error message for one of the contigs #--------- command -------------# Widget::exonerate::est2genome: /usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 /17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta -t /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136. fasta -Q dna -T dna --model est2genome --minintron 20 --showcigar --percent 20 > /raid01/projects/Plasmodiophora/brassica e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S c00001.235-1136.comp14545_c0_seq1.est_exonerate #-------------------------------# cleaning blastn... cleaning tblastx... cleaning blastx... ERROR: Failed on PbPT3Sc00001_S_0.8_1-mRNA-1 Check your input GFF3 file for errors! (from GFFDB) FATAL ERROR ERROR: Failed while processing the chunk divide!! ERROR: Chunk failed at level 17 !! FAILED CONTIG:PbPT3Sc00001 --Next Contig-- Regards HB On 14-02-11 12:37 PM, "Daniel Ence" wrote: >Hossein, > >Ok. So since this error came up on a local install, I'm going to need >some more information to understand what went wrong. Is it the same >contig that always causes this error? If it is, then is the the only >error or warning that MAKER encounters while running on this contig? Or, >if multiple contigs fail, then is it always the same error? > >If you can narrow it down to the smallest possible dataset that >consistently gives the same error, then we canb egin to understand what's >wrong. > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 11:20 AM >To: Daniel Ence >Subject: Re: [maker-devel] Falied to create new account > >Hi Daniel > >I running it through the local server at my work > > > > > > >M. Hossein Borhan, Ph.D. >Research Scientist/ Chercheur Scientifique >Saskatoon Research Centre/Centre de Recherches de Saskatoon >Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >107 Science Place, Saskatoon, SK.,S7N 0X2 >Telephone/T?l?phone: (306) 385-9441 >Facsimile/T?l?copieur: (306) 385-9482 >Hossein.borhan at agr.gc.ca > > > > > > > > >On 14-02-11 12:16 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>Did you encounter this error while you were running MAKER on your local >>machine or through the MAKER web annotation service? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Carson Holt [carsonhh at gmail.com] >>Sent: Tuesday, February 11, 2014 10:18 AM >>To: Daniel Ence >>Cc: Mark Yandell >>Subject: FW: [maker-devel] Falied to create new account >> >>Hey Daniel could you download his dataset, and see if you can replicate >>the error. Also check if this was an MWAS job or a local maker run (his >>dataset will already be there for MWAS, you just need the job ID). >> >>Thanks, >>Carson >> >>On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: >> >>>Hi Carson >>> >>> >>>I encountered this error while running maker >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00006 >>> >>> >>> >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>>> >>> >> >> > From marc.hoeppner at imbim.uu.se Wed Feb 12 02:34:12 2014 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Wed, 12 Feb 2014 09:34:12 +0100 Subject: [maker-devel] Annotations from protein alignments Message-ID: <52FB3204.60606@imbim.uu.se> Dear list, I have an annotation project with both protein data (it's a bird, so I've been using both vertebrates in general and chicken in specific), and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq data. The chicken augustus model seems to do a decent job in seeding gene loci, but it's not quite perfect. I want to use protein alignments to create a high-confidence set of exons and subsequently a set of gene loci to train e.g. snap), but when testing to set protein2genome=1 I never get any annotations. This is also true for the test data set that is delivered together with Maker (hsap_). Anything I should know about the use of proteins to generate annotations? I left all settings in the config file at their defaults (except protein2genome=1). I've tried this with both Maker 2.30 and 2.31. All the best, Marc -- ----------- Marc P. Hoeppner, PhD Group leader BILS Genome annotation platform Department of Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoepner at imbim.uu.se From carsonhh at gmail.com Wed Feb 12 09:42:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 12 Feb 2014 08:42:36 -0700 Subject: [maker-devel] Annotations from protein alignments In-Reply-To: <52FB3204.60606@imbim.uu.se> References: <52FB3204.60606@imbim.uu.se> Message-ID: I updated the 2.31 tar ball. Go ahead and download it again. protein2genome was turned off for eukaryotes and only working for prokaryotic genomes. ?Carson On 2/12/14, 1:34 AM, "Marc P. Hoeppner" wrote: >Dear list, > >I have an annotation project with both protein data (it's a bird, so >I've been using both vertebrates in general and chicken in specific), >and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq >data. The chicken augustus model seems to do a decent job in seeding >gene loci, but it's not quite perfect. I want to use protein alignments >to create a high-confidence set of exons and subsequently a set of gene >loci to train e.g. snap), but when testing to set protein2genome=1 I >never get any annotations. This is also true for the test data set that >is delivered together with Maker (hsap_). Anything I should know about >the use of proteins to generate annotations? I left all settings in the >config file at their defaults (except protein2genome=1). I've tried this >with both Maker 2.30 and 2.31. > >All the best, > >Marc > >-- >----------- >Marc P. Hoeppner, PhD >Group leader >BILS Genome annotation platform > >Department of Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoepner at imbim.uu.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 12:59:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 18:59:11 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > From dence at genetics.utah.edu Wed Feb 12 13:15:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 19:15:59 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , , Message-ID: Hi Hossein, One more question. How did you make the gff3 that you're passing through here? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu] Sent: Wednesday, February 12, 2014 11:59 AM To: Borhan, Hossein Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!! Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 14:42:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 20:42:03 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 12:55 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Hi Daniel I am using maker 2.10 I also checked the naming of the scaffold in the genome file and the gff file for the failed example. Naming is the same Thanks Hossein M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-12 1:30 PM, "Daniel Ence" wrote: >Hi Hossein, > >And which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 12:25 PM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Hi Daniel > >Gff file was generated by the 1st run of maker > > > >HB > > > > > > > >On 14-02-12 1:15 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>One more question. How did you make the gff3 that you're passing through >>here? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Daniel Ence [dence at genetics.utah.edu] >>Sent: Wednesday, February 12, 2014 11:59 AM >>To: Borhan, Hossein >>Cc: maker-devel at yandell-lab.org >>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk >>divide!! >> >>Hi Hossein, >> >>So, after looking at the gff3 and your control files, I had an idea. >>There's the part of the control file called "Re-annotation Using MAKER >>Derived GFF3", but you can also passthrough features from a gff3 using >>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. >> >>Sometimes we encounter problems with the MAKER passthrough. Could you try >>dividing the gff3 file into the different feature sources and passing it >>through the "est_gff" etc options and not with the MAKER passthrough? >>That will tell us if the problem is with the gff3 file or with how MAKER >>is processing it. >> >>Another also to check is to make sure that the contig names in the gff3 >>file match the contig names in the fasta file that you're annotating. >> >>Thanks, >>Daniel >> >> >> >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Wednesday, February 12, 2014 8:49 AM >>To: Daniel Ence >>Subject: Re: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >> >>I have generated the files that you requested. I choose Sc00009 from my >>genome which is 30 kb and was one of the scaffolds coming up with error. >>In addition to Ctl files and error output file I also attached a part of >>the gff file related to SC00009 that is indicated in the error message. >> >> >>Thanks for helping with this >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 4:59 PM, "Daniel Ence" wrote: >> >>>Hi Hossen, >>> >>>I think that what would be the most help right now is if you ran MAKER >>>on >>>only one of those contigs that are failing and send me the entire error >>>output along with the maker control files that you are using. It looks >>>like the error is coming from the gff3 files that you are using as >>>input. >>> >>>Thanks, >>>Daniel >>> >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 3:51 PM >>>To: Daniel Ence >>>Subject: ERROR: Failed while processing the chunk divide!! >>> >>>Dear Daniel >>> >>>I re-started maker and it is still running. But in error our file that >>>has >>>been generated so far it seems that smaller conitgs are affected. There >>>are contigs of 2-4 kb with this error but also I noticed a contig of >>>30kb >>>length having this error >>> >>>I was wondering if I need to change the setting in the maker_opt file >>> >>>#-----MAKER Behavior Options >>>max_dna_len=100000 #length for dividing up contigs into chunks >>>(increases/decreases memory usage) >>>min_contig=1 #skip genome contigs below this length (under 10kb are >>>often >>>useless) >>> >>> >>>If I understand correctly max_dna_len divide conitgs of over 100kb to >>>smaller chucks. However it is not clear to me that for the min_contig >>>option if the default contig length is 10kb or less, then why I have >>>error >>>message for 30kb long contigs. Should I change this to 0 >>> >>>Here is an example of the error message for one of the contigs >>> >>> >>>#--------- command -------------# >>>Widget::exonerate::est2genome: >>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>>-t >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11 >>>3 >>>6 >>>. >>>fasta >>>-Q dna -T dna --model est2genome >>>--minintron 20 --showcigar --percent 20 > >>>/raid01/projects/Plasmodiophora/brassica >>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra >>>s >>>s >>>i >>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP >>>T >>>3 >>>S >>>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>>#-------------------------------# >>>cleaning blastn... >>>cleaning tblastx... >>>cleaning blastx... >>>ERROR: Failed on >>>PbPT3Sc00001_S_0.8_1-mRNA-1 >>>Check your input GFF3 file for errors! >>>(from GFFDB) >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk >>>divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00001 >>> >>> >>> >>> >>>--Next Contig-- >>> >>> >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >>> >>>>Hossein, >>>> >>>>Ok. So since this error came up on a local install, I'm going to need >>>>some more information to understand what went wrong. Is it the same >>>>contig that always causes this error? If it is, then is the the only >>>>error or warning that MAKER encounters while running on this contig? >>>>Or, >>>>if multiple contigs fail, then is it always the same error? >>>> >>>>If you can narrow it down to the smallest possible dataset that >>>>consistently gives the same error, then we canb egin to understand >>>>what's >>>>wrong. >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>>Sent: Tuesday, February 11, 2014 11:20 AM >>>>To: Daniel Ence >>>>Subject: Re: [maker-devel] Falied to create new account >>>> >>>>Hi Daniel >>>> >>>>I running it through the local server at my work >>>> >>>> >>>> >>>> >>>> >>>> >>>>M. Hossein Borhan, Ph.D. >>>>Research Scientist/ Chercheur Scientifique >>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>>Telephone/T?l?phone: (306) 385-9441 >>>>Facsimile/T?l?copieur: (306) 385-9482 >>>>Hossein.borhan at agr.gc.ca >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>>> >>>>>Hi Hossein, >>>>> >>>>>Did you encounter this error while you were running MAKER on your >>>>>local >>>>>machine or through the MAKER web annotation service? >>>>> >>>>>Thanks, >>>>>Daniel >>>>> >>>>> >>>>>Daniel Ence >>>>>Graduate Student >>>>>Eccles Institute of Human Genetics >>>>>University of Utah >>>>>15 North 2030 East, Room 2100 >>>>>Salt Lake City, UT 84112-5330 >>>>>________________________________________ >>>>>From: Carson Holt [carsonhh at gmail.com] >>>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>>To: Daniel Ence >>>>>Cc: Mark Yandell >>>>>Subject: FW: [maker-devel] Falied to create new account >>>>> >>>>>Hey Daniel could you download his dataset, and see if you can >>>>>replicate >>>>>the error. Also check if this was an MWAS job or a local maker run >>>>>(his >>>>>dataset will already be there for MWAS, you just need the job ID). >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Hi Carson >>>>>> >>>>>> >>>>>>I encountered this error while running maker >>>>>> >>>>>>FATAL ERROR >>>>>>ERROR: Failed while processing the chunk divide!! >>>>>> >>>>>>ERROR: Chunk failed at level 17 >>>>>>!! >>>>>>FAILED CONTIG:PbPT3Sc00006 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From masa at bioinfo.hr Thu Feb 13 04:17:11 2014 From: masa at bioinfo.hr (Masa Roller) Date: Thu, 13 Feb 2014 11:17:11 +0100 Subject: [maker-devel] SNAP scores and AED scores Message-ID: <52FC9BA7.6060505@bioinfo.hr> Dear all, I ran snap2 based gene prediction through maker. In the resulting gff file, in the source "snap_masked" I can find the score in the score column of every snap prediction that did not get promoted to a maker gene. This would be the score of how well the prediction matches the HMM? It seems to me that those snap models that are given gene status no longer appear as snap_masked source but only as source "maker". Maker then removes the score column, instead giving AED and eAED scores (which are more about how the model corresponds to the evidence). When viewing the maker transcripts and SNAP predictions in a browser, they do not match (mostly, maker predictions are longer). I am interested in the score of individual gene predictions that underlined maker gene models. Where could I find that information? Many thanks! From carsonhh at gmail.com Thu Feb 13 14:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:11:22 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: <52FC9BA7.6060505@bioinfo.hr> References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: No. Snap genes do not disappear. All SNAP ab initio calls will always be kept as reference fetters marked snap_masked (for repeat masked genome) and snap (for unmasked genome). MAKER then runs SNAP another time where it feeds hints to SNAP based on EST and protein alignment evidence. These hint based models can then compete against the ab initio SNAP models to be promoted to genes if their AED scores are better. Fianl models can also get UTR added based on EST evidence. That is why you can get models from MAKER that do not match the original SNAP ab initio calls. So in summary, all SNAP ab initio models will be in snap_masked. The MAKER models will consist of hint based SNAP rerun plus SNAP ab intio models processed to add UTR. Thanks, Carson On 2/13/14, 3:17 AM, "Masa Roller" wrote: >Dear all, > >I ran snap2 based gene prediction through maker. > >In the resulting gff file, in the source "snap_masked" I can find the >score in the score column of every snap prediction that did not get >promoted to a maker gene. This would be the score of how well the >prediction matches the HMM? > >It seems to me that those snap models that are given gene status no >longer appear as snap_masked source but only as source "maker". Maker >then removes the score column, instead giving AED and eAED scores (which >are more about how the model corresponds to the evidence). When viewing >the maker transcripts and SNAP predictions in a browser, they do not >match (mostly, maker predictions are longer). > >I am interested in the score of individual gene predictions that >underlined maker gene models. Where could I find that information? > >Many thanks! > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Feb 13 14:23:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:23:07 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: On a side note. Because the MAKER models involve modifying either the ab initio SNAP model or manipulating the underlying scoring scheme using hints, the SNAP score on those is virtually meaningless. However Ian Korf has developed a tool that can take any gene structure and reverse generate a score (i.e. what would the score of this gene have been if SNAP would have called it that way in the first place). I believe the tool is called fathom and is part of the SNAP package. It is not well documented, so you might have to contact Ian Korf directly for that. You can use the maker2zff tool to generate the input to fathom. Thanks, Carson On 2/13/14, 1:11 PM, "Carson Holt" wrote: >No. Snap genes do not disappear. All SNAP ab initio calls will always be >kept as reference fetters marked snap_masked (for repeat masked genome) >and snap (for unmasked genome). MAKER then runs SNAP another time where >it feeds hints to SNAP based on EST and protein alignment evidence. These >hint based models can then compete against the ab initio SNAP models to be >promoted to genes if their AED scores are better. Fianl models can also >get UTR added based on EST evidence. That is why you can get models from >MAKER that do not match the original SNAP ab initio calls. > >So in summary, all SNAP ab initio models will be in snap_masked. The >MAKER models will consist of hint based SNAP rerun plus SNAP ab intio >models processed to add UTR. > >Thanks, >Carson > > > >On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >>Dear all, >> >>I ran snap2 based gene prediction through maker. >> >>In the resulting gff file, in the source "snap_masked" I can find the >>score in the score column of every snap prediction that did not get >>promoted to a maker gene. This would be the score of how well the >>prediction matches the HMM? >> >>It seems to me that those snap models that are given gene status no >>longer appear as snap_masked source but only as source "maker". Maker >>then removes the score column, instead giving AED and eAED scores (which >>are more about how the model corresponds to the evidence). When viewing >>the maker transcripts and SNAP predictions in a browser, they do not >>match (mostly, maker predictions are longer). >> >>I am interested in the score of individual gene predictions that >>underlined maker gene models. Where could I find that information? >> >>Many thanks! >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From barry.utah at gmail.com Thu Feb 13 14:27:17 2014 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 13 Feb 2014 13:27:17 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu> Hi Masa, Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore. This is a directory structure created by MAKER to store contig specific data. There is a datastore directory (and a corresponding index file) in the make output directory. The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid. This contains all of the output of each program that MAKER runs. B On Feb 13, 2014, at 1:11 PM, Carson Holt wrote: > No. Snap genes do not disappear. All SNAP ab initio calls will always be > kept as reference fetters marked snap_masked (for repeat masked genome) > and snap (for unmasked genome). MAKER then runs SNAP another time where > it feeds hints to SNAP based on EST and protein alignment evidence. These > hint based models can then compete against the ab initio SNAP models to be > promoted to genes if their AED scores are better. Fianl models can also > get UTR added based on EST evidence. That is why you can get models from > MAKER that do not match the original SNAP ab initio calls. > > So in summary, all SNAP ab initio models will be in snap_masked. The > MAKER models will consist of hint based SNAP rerun plus SNAP ab intio > models processed to add UTR. > > Thanks, > Carson > > > > On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >> Dear all, >> >> I ran snap2 based gene prediction through maker. >> >> In the resulting gff file, in the source "snap_masked" I can find the >> score in the score column of every snap prediction that did not get >> promoted to a maker gene. This would be the score of how well the >> prediction matches the HMM? >> >> It seems to me that those snap models that are given gene status no >> longer appear as snap_masked source but only as source "maker". Maker >> then removes the score column, instead giving AED and eAED scores (which >> are more about how the model corresponds to the evidence). When viewing >> the maker transcripts and SNAP predictions in a browser, they do not >> match (mostly, maker predictions are longer). >> >> I am interested in the score of individual gene predictions that >> underlined maker gene models. Where could I find that information? >> >> Many thanks! >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mptrsen at uni-bonn.de Thu Feb 13 21:00:24 2014 From: mptrsen at uni-bonn.de (Malte Petersen) Date: Fri, 14 Feb 2014 04:00:24 +0100 Subject: [maker-devel] BLAST options error / should Maker check for file format? Message-ID: <52FD86C8.6040007@uni-bonn.de> Dear MAKER devs, I was running Maker version 2.30p-beta on an insect genome, and it didn't produce any output. I got these error messages: Widget::formater: /path/to/makeblastdb -dbtype nucl -in /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 #-------------------------------# BLAST options error: File /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 is empty ERROR: /path/to/makeblastdb failed in Widget::formater --> rank=NA, hostname=Jeanne-GBR ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:scf7180005143343 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf7180005143343 I figured out that this error is due to a non-Fasta file format being fed to Maker as extrinsic evidence (I gave it a meta-info file). While I got the pipeline running now with the correct file, I think that it should be complaining (a lot earlier) if any of the input files are of the wrong format. More people might run into this problem and have no idea where to look for a solution. What do you think? Best, Malte From carsonhh at gmail.com Thu Feb 13 21:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 20:11:22 -0700 Subject: [maker-devel] BLAST options error / should Maker check for file format? In-Reply-To: <52FD86C8.6040007@uni-bonn.de> References: <52FD86C8.6040007@uni-bonn.de> Message-ID: Hi Malte, Actually there already is. I?m very surprised your file made it that far. Normally it fails right away. Example ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1 appears to be empty. Another test file ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The nucleotide sequence file '/Users/cholt/Developer/maker/trunk/data/test2' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'M' You seem to have found just the right formula of improper input to get past the filters on your run :-) Thanks, Carson On 2/13/14, 8:00 PM, "Malte Petersen" wrote: >Dear MAKER devs, > >I was running Maker version 2.30p-beta on an insect genome, and it >didn't produce any output. I got these error messages: > > >Widget::formater: >/path/to/makeblastdb -dbtype nucl -in >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >#-------------------------------# >BLAST options error: File >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >is empty >ERROR: /path/to/makeblastdb failed in Widget::formater >--> rank=NA, hostname=Jeanne-GBR >ERROR: Failed while doing blastn of ESTs >ERROR: Chunk failed at level:0, tier_type:3 >FAILED CONTIG:scf7180005143343 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf7180005143343 > > >I figured out that this error is due to a non-Fasta file format being >fed to Maker as extrinsic evidence (I gave it a meta-info file). While >I got the pipeline running now with the correct file, I think that it >should be complaining (a lot earlier) if any of the input files are of >the wrong format. More people might run into this problem and have no >idea where to look for a solution. > >What do you think? > >Best, >Malte > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 14 13:09:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Feb 2014 19:09:08 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features. The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". I've attached a script that should do this for you. Run it with this command "perl fix_gff3_script.pl > " And then run MAKER with the fixed gff3 file in place of the old gff3 file. Let me know if that works, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Thursday, February 13, 2014 3:27 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I downloaded maker 2.31 and ran the same scaffold. Again it gave error on the gff file. I then removed the word mRNA-1 from my gff file and ran it again. It seems to have worked this time. Attached are std error files for first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA (that apparently worked). Since the gff file is as evidence only I thought it should not matter to remove the mRNA-1 naming form the gff file. Cheers HB On 14-02-12 12:59 PM, "Daniel Ence" wrote: >Hi Hossein, > >So, after looking at the gff3 and your control files, I had an idea. >There's the part of the control file called "Re-annotation Using MAKER >Derived GFF3", but you can also passthrough features from a gff3 using >the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. > >Sometimes we encounter problems with the MAKER passthrough. Could you try >dividing the gff3 file into the different feature sources and passing it >through the "est_gff" etc options and not with the MAKER passthrough? >That will tell us if the problem is with the gff3 file or with how MAKER >is processing it. > >Another also to check is to make sure that the contig names in the gff3 >file match the contig names in the fasta file that you're annotating. > >Thanks, >Daniel > > > >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 8:49 AM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > > >I have generated the files that you requested. I choose Sc00009 from my >genome which is 30 kb and was one of the scaffolds coming up with error. >In addition to Ctl files and error output file I also attached a part of >the gff file related to SC00009 that is indicated in the error message. > > >Thanks for helping with this > > > >Regards > > >HB > > > > > > > > > > > > >On 14-02-11 4:59 PM, "Daniel Ence" wrote: > >>Hi Hossen, >> >>I think that what would be the most help right now is if you ran MAKER on >>only one of those contigs that are failing and send me the entire error >>output along with the maker control files that you are using. It looks >>like the error is coming from the gff3 files that you are using as input. >> >>Thanks, >>Daniel >> >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 3:51 PM >>To: Daniel Ence >>Subject: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >>I re-started maker and it is still running. But in error our file that >>has >>been generated so far it seems that smaller conitgs are affected. There >>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >>length having this error >> >>I was wondering if I need to change the setting in the maker_opt file >> >>#-----MAKER Behavior Options >>max_dna_len=100000 #length for dividing up contigs into chunks >>(increases/decreases memory usage) >>min_contig=1 #skip genome contigs below this length (under 10kb are often >>useless) >> >> >>If I understand correctly max_dna_len divide conitgs of over 100kb to >>smaller chucks. However it is not clear to me that for the min_contig >>option if the default contig length is 10kb or less, then why I have >>error >>message for 30kb long contigs. Should I change this to 0 >> >>Here is an example of the error message for one of the contigs >> >> >>#--------- command -------------# >>Widget::exonerate::est2genome: >>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>-t >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113 >>6 >>. >>fasta >>-Q dna -T dna --model est2genome >>--minintron 20 --showcigar --percent 20 > >>/raid01/projects/Plasmodiophora/brassica >>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras >>s >>i >>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT >>3 >>S >>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>#-------------------------------# >>cleaning blastn... >>cleaning tblastx... >>cleaning blastx... >>ERROR: Failed on >>PbPT3Sc00001_S_0.8_1-mRNA-1 >>Check your input GFF3 file for errors! >>(from GFFDB) >> >>FATAL ERROR >>ERROR: Failed while processing the chunk >>divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00001 >> >> >> >> >>--Next Contig-- >> >> >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >> >>>Hossein, >>> >>>Ok. So since this error came up on a local install, I'm going to need >>>some more information to understand what went wrong. Is it the same >>>contig that always causes this error? If it is, then is the the only >>>error or warning that MAKER encounters while running on this contig? Or, >>>if multiple contigs fail, then is it always the same error? >>> >>>If you can narrow it down to the smallest possible dataset that >>>consistently gives the same error, then we canb egin to understand >>>what's >>>wrong. >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 11:20 AM >>>To: Daniel Ence >>>Subject: Re: [maker-devel] Falied to create new account >>> >>>Hi Daniel >>> >>>I running it through the local server at my work >>> >>> >>> >>> >>> >>> >>>M. Hossein Borhan, Ph.D. >>>Research Scientist/ Chercheur Scientifique >>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>Telephone/T?l?phone: (306) 385-9441 >>>Facsimile/T?l?copieur: (306) 385-9482 >>>Hossein.borhan at agr.gc.ca >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>> >>>>Hi Hossein, >>>> >>>>Did you encounter this error while you were running MAKER on your local >>>>machine or through the MAKER web annotation service? >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Carson Holt [carsonhh at gmail.com] >>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>To: Daniel Ence >>>>Cc: Mark Yandell >>>>Subject: FW: [maker-devel] Falied to create new account >>>> >>>>Hey Daniel could you download his dataset, and see if you can replicate >>>>the error. Also check if this was an MWAS job or a local maker run >>>>(his >>>>dataset will already be there for MWAS, you just need the job ID). >>>> >>>>Thanks, >>>>Carson >>>> >>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Hi Carson >>>>> >>>>> >>>>>I encountered this error while running maker >>>>> >>>>>FATAL ERROR >>>>>ERROR: Failed while processing the chunk divide!! >>>>> >>>>>ERROR: Chunk failed at level 17 >>>>>!! >>>>>FAILED CONTIG:PbPT3Sc00006 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fix_gff3_script.pl Type: application/octet-stream Size: 349 bytes Desc: fix_gff3_script.pl URL: From claudio.valero at wur.nl Mon Feb 17 03:23:21 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Mon, 17 Feb 2014 09:23:21 +0000 Subject: [maker-devel] Maker not predicting many genes Message-ID: Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 4776 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SOBA.pdf Type: application/pdf Size: 210262 bytes Desc: SOBA.pdf URL: From carson.holt at genetics.utah.edu Mon Feb 17 13:22:13 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Feb 2014 19:22:13 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 17 13:26:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Feb 2014 12:26:05 -0700 Subject: [maker-devel] Maker not predicting many genes Message-ID: >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 02:20:04 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 08:20:04 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 19 09:34:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Feb 2014 08:34:33 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You provided a directory rather than a file to the -d option (?d' stands for datastore log). You must provide the location of the datastore index log file and not the datastore directory. Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log Thanks, Carson From: "Valero Jimenez, Claudio" Date: Wednesday, February 19, 2014 at 1:20 AM To: Carson Holt , Carson Holt , "'maker-devel at yandell-lab.org'" Subject: RE: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 19 10:04:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 19 Feb 2014 16:04:08 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 10:33:36 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 16:33:36 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi, Thanks, I had a mistake in the command line!!! Regards, Claudio From: Daniel Ence [mailto:dence at genetics.utah.edu] Sent: woensdag 19 februari 2014 17:04 To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: RE: [maker-devel] Maker not predicting many genes Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I'd set correct_est_fusion=1 as well. -Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Wed Feb 19 12:03:47 2014 From: barry.utah at gmail.com (Barry Moore) Date: Wed, 19 Feb 2014 11:03:47 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: > Hi Claudio, > > What was the command line you used for gff3_merge? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] > Sent: Wednesday, February 19, 2014 1:20 AM > To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > Hi Carson, > > Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: > > Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. > > Similar thing happens when I try fasta_merge: > > Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. > > I never had this problem before with these commands. > > > Regards, > > Claudio > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: maandag 17 februari 2014 20:26 > To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. > > ?Carson > > > From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM > To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes > > You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. > > If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). > > Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. > > Thanks, > Carson > > > > > > > From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM > To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes > > Dear list, > > I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. > > Regards, > > Claudio > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 19 12:06:52 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 19 Feb 2014 18:06:52 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> References: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Message-ID: You only need to swap a single character in the script. Just change the -e (exists) test to a -f (is file) test. Thanks, Carson From: Barry Moore > Date: Wednesday, February 19, 2014 at 11:03 AM To: Daniel Ence > Cc: "Valero Jimenez, Claudio" >, Carson Holt >, Carson Holt >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gtaylor at bcgsc.ca Fri Feb 21 12:48:42 2014 From: gtaylor at bcgsc.ca (Greg Taylor) Date: Fri, 21 Feb 2014 10:48:42 -0800 Subject: [maker-devel] Maker jobs hanging Message-ID: Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor From dence at genetics.utah.edu Fri Feb 21 12:54:17 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 18:54:17 +0000 Subject: [maker-devel] Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166 Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca] Sent: Friday, February 21, 2014 11:48 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker jobs hanging Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Feb 21 12:56:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Feb 2014 11:56:50 -0700 Subject: [maker-devel] Maker jobs hanging Message-ID: Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 16:04:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 22:04:34 +0000 Subject: [maker-devel] FW: Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Friday, February 21, 2014 11:56 AM To: Greg Taylor; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker jobs hanging Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 20:38:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 02:38:59 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> Message-ID: Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Fri Feb 21 22:27:10 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 04:27:10 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>, , Message-ID: Hi Joe, MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. Will you attach those file to your reply, so we can make sure that the settings are set up correctly? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 7:44 PM To: Daniel Ence Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Daniel, Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. Thanks, Joe ________________________________________ From: Daniel Ence Sent: Friday, February 21, 2014 7:38 PM To: Joseph Said Cc: maker-devel at yandell-lab.org Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Sat Feb 22 16:51:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 22:51:48 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>, Message-ID: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Sat Feb 22 17:21:51 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 23:21:51 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> , Message-ID: Hi Hua, will you upload the genome file to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170 I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model. Will you also upload the MAKER output and datastore from the MAKER run? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 4:00 PM To: Daniel Ence Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" > wrote: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sun Feb 23 10:57:09 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 16:57:09 +0000 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se> Dear list and maker developers, I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300). As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this? Best regards, Mikael Durling From carsonhh at gmail.com Sun Feb 23 14:00:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 23 Feb 2014 13:00:50 -0700 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: The image doesn?t show all evidence sources, but the short answer is that one of you evidence sources (est2genome, protein2genome, or blastx) bridges the two regions, and when provided the bridged hint one of the gene predictors thinks it makes sense to create a single model instead. my guess is that it?s blastx evidence. ?Carson On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" wrote: >Dear list and maker developers, > >I was browsing the results of a recent maker run, focusing on differences >between this run with the a recent maker (svn r1067) and a previous run >with svn revision 1022 (I recall). One of the differences I found was a >gene lost in the new prediction set, but replaced by an extended version >of a previous neighbor (see >http://figshare.com/articles/Maker_prediction_comparison/942300). As you >can see, there is no support for the join in the evidence. Do you have >any clue to what might cause this? > >Best regards, >Mikael Durling > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Sun Feb 23 15:14:00 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 21:14:00 +0000 Subject: [maker-devel] Maker predicting fusion genes? In-Reply-To: References: Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se> Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted). Mikael 23 feb 2014 kl. 21:00 skrev Carson Holt : > The image doesn?t show all evidence sources, but the short answer is that > one of you evidence sources (est2genome, protein2genome, or blastx) > bridges the two regions, and when provided the bridged hint one of the > gene predictors thinks it makes sense to create a single model instead. > my guess is that it?s blastx evidence. > > ?Carson > > > On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Dear list and maker developers, >> >> I was browsing the results of a recent maker run, focusing on differences >> between this run with the a recent maker (svn r1067) and a previous run >> with svn revision 1022 (I recall). One of the differences I found was a >> gene lost in the new prediction set, but replaced by an extended version >> of a previous neighbor (see >> http://figshare.com/articles/Maker_prediction_comparison/942300). As you >> can see, there is no support for the join in the evidence. Do you have >> any clue to what might cause this? >> >> Best regards, >> Mikael Durling >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: region.gff3 Type: application/octet-stream Size: 19612 bytes Desc: region.gff3 URL: From hedgyx at yahoo.com Mon Feb 24 01:02:41 2014 From: hedgyx at yahoo.com (Megan) Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Maker folks, I am re-annotating a single contig and I am having a few problems. First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added). The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first. So it appears to be a systematic error across the entire file. The Maker error is "Check your input GFF3 file for errors! (from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff. Attached is a gff with the first 2 genes. Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV]. It suggests "set -fix_nucleotides on the command line to fix this automatically". Is the -fix_nucleotides a Maker flag? What exactly does it do? Does it remove the entire sequence or replace ambiguous bases with a randomly selected one? Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely. Also, just curious, has Maker never supported these characters but just never complained? I used this EST data set with Maker 2.09. I did note poor EST coverage, but thought it was an issue with the EST data itself. I appreciate any suggestions. Thanks, Megan -------------- next part -------------- A non-text attachment was scrubbed... Name: part_passthru.gff Type: application/octet-stream Size: 4363 bytes Desc: not available URL: From zh9118 at gmail.com Sat Feb 22 17:00:28 2014 From: zh9118 at gmail.com (Hua Zhong) Date: Sat, 22 Feb 2014 16:00:28 -0700 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> Message-ID: The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" wrote: > Hi, > > Will you send me the long file that you were trying to blast against? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* Hua Zhong [zh9118 at gmail.com] > *Sent:* Saturday, February 22, 2014 10:46 AM > *To:* Daniel Ence > *Cc:* Joe Song; Joseph Said > *Subject:* Re: I am a PhD candidate at NMSU and have a question about > maker2 > > hi all, > Attached are the three configuration files and two input files, which are > used to predict something between the genome and protein. For a simple > test, we used one short sequence about 60bp and its translated protein > sequence as inputs. But got nothing returned. What's more, we did test long > genome sequence as one input as well, but still got nothing. I am not sure > what's the reason cause this result. > Thanks a lot for help. > > Hua > > > > > On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said wrote: > >> Hi Daniel, >> >> I do not have the exact files with me right now, but my coauthors on the >> paper I am working on have been copied on this email. Hua can send you >> those files. Thank you for being very helpful especially on a Friday night. >> >> Thanks, >> Joe >> >> Sent from my iPad >> >> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" >> wrote: >> > >> > Hi Joe, >> > >> > MAKER runs blast from your local system (or your server where MAKER is >> installed), and it blasts evidence that the user supplies in the "est" and >> "protein" settings. The est and protein settings are set in the >> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file >> and the specific blast settings are in the "maker_bopts.ctl" file. >> > >> > Will you attach those file to your reply, so we can make sure that the >> settings are set up correctly? >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 7:44 PM >> > To: Daniel Ence >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Daniel, >> > >> > Thank you for getting back to me so quickly. I am using the cotton >> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the >> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I >> believe maker2 just calls BLAST from NCBI's page. So when I search the >> cotton genome it returns zero hits. But then I used a known cotton gene as >> a test and ran a search and also returned zero hits. I am not sure what the >> problem is but it seems like the protocol that should be returning the >> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. >> I can a BLAST standalone and came up with hits for both my gene of interest >> and the control test gene and came up with results. >> > >> > Thanks, >> > Joe >> > ________________________________________ >> > From: Daniel Ence >> > Sent: Friday, February 21, 2014 7:38 PM >> > To: Joseph Said >> > Cc: maker-devel at yandell-lab.org >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Joe, >> > >> > Will you upload your control files and data at this URL? >> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 >> > >> > Also, what version of MAKER and blast are you using? And which file are >> you using for the known arabidopsis gene? >> > >> > I've copied this email to the maker-development list, which is a really >> good resource for trouble-shooting MAKER issues. >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Mark Yandell >> > Sent: Friday, February 21, 2014 7:32 PM >> > To: Daniel Ence >> > Subject: FW: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Mark Yandell >> > Professor of Human Genetics >> > H.A. & Edna Benning Presidential Endowed Chair >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ph:801-587-7707 >> > >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 5:18 PM >> > To: Mark Yandell >> > Subject: I am a PhD candidate at NMSU and have a question about maker2 >> > >> > Dear Dr. Yandell, >> > >> > I am a molecular biologist at NMSU. I am trying to use maker2 with the >> cotton genome, and search an Arabidopsis gene against it. I think there is >> a problem with the blast component because zero results are returned. I >> tried troubleshooting by searching a known gene and still returned zero >> results. Is this a common problem maybe with the pipeline? I would >> appreciate any ideas you might have to help me. >> > >> > Thank you, >> > Joe >> > >> > Sent from my iPad >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 24 12:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:18:18 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Feb 24 12:31:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 24 Feb 2014 18:31:47 +0000 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>, Message-ID: Hi Megan, One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. You can fix it with this command-line perl: cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3 It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, February 24, 2014 11:18 AM To: Megan; maker-devel at yandell-lab.org Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 12:34:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:34:28 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: Actually that is not true. CDS IDs can be the same or different. MAKER doesn?t care either way. Both are valid in GFF3. Having the same ID just allows then to be put together by some GMOD viewers without having to go through a container feature. ?Carson On 2/24/14, 11:31 AM, "Daniel Ence" wrote: >Hi Megan, > >One problem with the GFF3 that you attached is that the ID's for the CDS >features are being made wrong. All of the CDS features for a given mRNA >or transcript should have the same ID. The CDS features in your GFF3 have >IDs that use the exon name. > >You can fix it with this command-line perl: >cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; >/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print >"$_\n"}else{print $_}' > fixed.gff3 > >It just fixes the ID attributes in all of the CDS features. Try it on the >test gff3 you sent and let me know if it works. I can't test it myself >without the fasta file that you are annotating. > >Thanks, >Daniel > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Carson Holt [carsonhh at gmail.com] >Sent: Monday, February 24, 2014 11:18 AM >To: Megan; maker-devel at yandell-lab.org >Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > >The -fix_nucleotides flag is added to the command line (I.e. maker >-fix_nucleotides flag). It is there so you are aware that there is an >issue with your fasta file, that will cause things downstream to fail. >MAKER can fix the errors for you, but first it gives a warning designed to >make you look at the file and validate it. Why would you want to do this? > For example, what if you provided protein sequence to the EST option >accidentally, you wouldn?t want MAKER to just proceed. You want a warning >so you can check first. If your file is in fact EST data, then set the >flag and those characters will be changed to N?s in the fixed fasta >sequence, otherwise those characters will cause errors in downstream tools >like exonerate, and even some downstream GMOD tools, so they can?t be >allowed to remain as is. > >For the GFF3 file, there is almost definitely a logic issue in the file >(mod encode validator won?t check for those). This can be from prior >manipulation of the GFF3 file. For example, IDs for a gene that are the >same across two contigs (technically valid but a logic error). The GFF3 >error message will normally give the ID of the feature causing the issue. > >I could also take a look for you. You can upload the GFF3 file here ?> >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >Click on 'new guest account' then e-mail me back you guest ID, so I know >which files to review. > >Thanks, >Carson > > > >On 2/24/14, 12:02 AM, "Megan" wrote: > >>Maker folks, >>I am re-annotating a single contig and I am having a few problems. >> >>First, I am having trouble passing through a Maker derived gff (from >>Maker 2.09, with some modifications to gene names and functional >>information added). The gff file passes the modencode validator but >>Maker always fails on the first gene in the file, regardless of which >>gene comes first. So it appears to be a systematic error across the >>entire file. The Maker error is "Check your input GFF3 file for errors! >>(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >>with model_pass=1 and pred_gff. Attached is a gff with the first 2 >>genes. >> >>Second, when I updated to Maker 2.31, Maker now complains that my EST >>fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >>suggests "set -fix_nucleotides on the command line to fix this >>automatically". Is the -fix_nucleotides a Maker flag? What exactly does >>it do? Does it remove the entire sequence or replace ambiguous bases >>with a randomly selected one? Half of my 20k ESTs contain these >>characters, so I don't want to throw them out entirely. >> >>Also, just curious, has Maker never supported these characters but just >>never complained? I used this EST data set with Maker 2.09. I did note >>poor EST coverage, but thought it was an issue with the EST data itself. >> >>I appreciate any suggestions. >>Thanks, >>Megan_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 14:59:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 13:59:12 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: I found the issue. You have non-ascii characters at the end of almost every line. Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works. Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions. I have looked at the gff file, but I do not >see any obvious errors. I have uploaded the files to your website. The >reference fasta is there, the full gff, and a single gene gff that also >causes an error. If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff. The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information. I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag). It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it. Why would > you want to do this? > For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed. You want a warning > so you can check first. If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those). This > can be from prior > manipulation of the GFF3 file. For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error). The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you. You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added). The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first. So it appears to be a systematic > error across the > >entire file. The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)". I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff. Attached is a gff > with the first 2 > >genes. > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV]. It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically". Is the -fix_nucleotides a Maker > flag? What exactly does > >it do? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained? I used this EST data set with > Maker 2.09. I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: purify Type: application/octet-stream Size: 1965 bytes Desc: not available URL: From carsonhh at gmail.com Mon Feb 24 15:03:00 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 14:03:00 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: One more thing. You must give the file to pred_gff or model_gff. It is no longer strictly a MAKER file, as many of the source columns read ?.? meaning it has been edited by Apollo or another editor. So it will not be guaranteed to be recognized by genome_gff, because many of the source tags have changed. Thanks, Carson On 2/24/14, 1:59 PM, "Carson Holt" wrote: >I found the issue. You have non-ascii characters at the end of almost >every line. Because they are happening within the Parent= tag, they then >become part of the Parent ID when the file is read. > >So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent >ID. > >?\cM? is a meta-return. > >I ran the attached script to remove these characters (perl purify >), and then it works. Make sure to remove the >.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file >to force the GFF3 database to be rebuilt after fixing the file when you >rerun MAKER. > >Thanks, >Carson > > > > >On 2/24/14, 1:32 PM, "Megan" wrote: > >>Hi Carson and Daniel, >> >>Thanks for your suggestions. I have looked at the gff file, but I do not >>see any obvious errors. I have uploaded the files to your website. The >>reference fasta is there, the full gff, and a single gene gff that also >>causes an error. If I remove that gene from the full gff, then the error >>is on the next gene in the file, so it appears to be a systematic problem >>throughout the gff. The gff was generated by Maker, but I may have >>messed it up when I modified it to rename genes and add functional >>information. I checked with cat -te, but don't see any obvious >>formatting errors. >> >>Thanks! >>Megan >> >> >>-------------------------------------------- >>On Mon, 2/24/14, Carson Holt wrote: >> >> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >>nucleotides >> To: "Megan" , maker-devel at yandell-lab.org >> Date: Monday, February 24, 2014, 10:18 AM >> >> The -fix_nucleotides flag is added to >> the command line (I.e. maker >> -fix_nucleotides flag). It is there so you are aware >> that there is an >> issue with your fasta file, that will cause things >> downstream to fail. >> MAKER can fix the errors for you, but first it gives a >> warning designed to >> make you look at the file and validate it. Why would >> you want to do this? >> For example, what if you provided protein sequence to the >> EST option >> accidentally, you wouldn?t want MAKER to just >> proceed. You want a warning >> so you can check first. If your file is in fact EST >> data, then set the >> flag and those characters will be changed to N?s in the >> fixed fasta >> sequence, otherwise those characters will cause errors in >> downstream tools >> like exonerate, and even some downstream GMOD tools, so they >> can?t be >> allowed to remain as is. >> >> For the GFF3 file, there is almost definitely a logic issue >> in the file >> (mod encode validator won?t check for those). This >> can be from prior >> manipulation of the GFF3 file. For example, IDs for a >> gene that are the >> same across two contigs (technically valid but a logic >> error). The GFF3 >> error message will normally give the ID of the feature >> causing the issue. >> >> I could also take a look for you. You can upload the >> GFF3 file here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> Click on 'new guest account' then e-mail me back you guest >> ID, so I know >> which files to review. >> >> Thanks, >> Carson >> >> >> >> On 2/24/14, 12:02 AM, "Megan" >> wrote: >> >> >Maker folks, >> >I am re-annotating a single contig and I am having a few >> problems. >> > >> >First, I am having trouble passing through a Maker >> derived gff (from >> >Maker 2.09, with some modifications to gene names and >> functional >> >information added). The gff file passes the >> modencode validator but >> >Maker always fails on the first gene in the file, >> regardless of which >> >gene comes first. So it appears to be a systematic >> error across the >> >entire file. The Maker error is "Check your input >> GFF3 file for errors! >> >(from GFFDB)". I have tried Maker 2.10 >> and 2.31, using both genome_gff >> >with model_pass=1 and pred_gff. Attached is a gff >> with the first 2 >> >genes. >> > >> >Second, when I updated to Maker 2.31, Maker now >> complains that my EST >> >fasta file has nucleotides that are not supported >> [RYKMSWBDHV]. It >> >suggests "set -fix_nucleotides on the command line to >> fix this >> >automatically". Is the -fix_nucleotides a Maker >> flag? What exactly does >> >it do? Does it remove the entire sequence or >> replace ambiguous bases >> >with a randomly selected one? Half of my 20k ESTs >> contain these >> >characters, so I don't want to throw them out entirely. >> > >> >Also, just curious, has Maker never supported these >> characters but just >> >never complained? I used this EST data set with >> Maker 2.09. I did note >> >poor EST coverage, but thought it was an issue with the >> EST data itself. >> > >> >I appreciate any suggestions. >> >Thanks, >> >Megan_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > From rbharris at uw.edu Tue Feb 25 15:49:57 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Tue, 25 Feb 2014 13:49:57 -0800 Subject: [maker-devel] error in snap training Message-ID: Hey - I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 16:12:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 15:12:14 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Make sure you are using 2.31, and then try the maker2zff filters individually. If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP. Use cegma2zff for the conversion. --Carson Sent from my iPhone > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > Hey - > > I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. > > Thanks! > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Feb 25 18:06:03 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 25 Feb 2014 16:06:03 -0800 Subject: [maker-devel] Mapping gene names Message-ID: Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the *map_forward* option, which applies to the *model_gff* parameter. Is there a similar option for *est* and *protein*? *maker_opts.ctl* est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From hedgyx at yahoo.com Tue Feb 25 18:26:11 2014 From: hedgyx at yahoo.com (Megan) Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com> Carson, Everything ran through smoothly after removing the ^Ms. Thanks for the help. Megan -------------------------------------------- On Mon, 2/24/14, Carson Holt wrote: Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides To: "Megan" , "Daniel Ence" Cc: "maker-devel at yandell-lab.org" Date: Monday, February 24, 2014, 12:59 PM I found the issue.? You have non-ascii characters at the end of almost every line.? Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works.? Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions.? I have looked at the gff file, but I do not >see any obvious errors.? I have uploaded the files to your website.? The >reference fasta is there, the full gff, and a single gene gff that also >causes an error.? If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff.? The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information.? I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag).? It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it.? Why would > you want to do this? >? For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed.? You want a warning > so you can check first.? If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those).? This > can be from prior > manipulation of the GFF3 file.? For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error).? The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you.? You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added).? The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first.? So it appears to be a systematic > error across the > >entire file.? The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)".???I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff.? Attached is a gff > with the first 2 > >genes.? > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV].? It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically".? Is the -fix_nucleotides a Maker > flag?? What exactly does > >it do?? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one?? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained?? I used this EST data set with > Maker 2.09.? I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Feb 25 18:58:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 17:58:08 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 19:04:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 18:04:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: One more note. When using this option, the score column of mRNA features will represent how completely this gene matches the source EST/protein (fraction coverage multiplied by % identity). So a value of 100 means there is perfect match. This way if the same transcript maps to multiple locations, then you can identify which locations is the closest match (also works for identifying likly orthologs vs. paralogs). ?Carson From: Carson Holt Date: Tuesday, February 25, 2014 at 5:58 PM To: Shaun Jackman , Subject: Re: [maker-devel] Mapping gene names There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 19:36:21 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 17:36:21 -0800 Subject: [maker-devel] invalid gff3 format issues Message-ID: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 25 20:02:04 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 26 Feb 2014 02:02:04 +0000 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Walter, Will you upload the full GFF3 and the control files that you used to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 Also, what version of MAKER are you running this with? Thanks, Daniel On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 20:11:12 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 18:11:12 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Daniel, those have been uploaded and I'm using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to > this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > > Hi all, > > I am trying to update maker annotations with PASA and encountered errors > stemming from file format issues in the gff3 file. > > I put a few lines from the gff3 to highlight the issue below. Basically, > the problem is that there are non-unique IDs for a number of the > annotations. > > Is there anything that can be done to right this problem? > > Thanks, > > Walter > > Lines from GFF3 file, repeated IDs are highlighted: > > > chr1 maker gene 9377440 9432028 . - . > ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 > chr1 maker mRNA 9377440 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 > chr1 maker exon 9431899 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > chr1 maker exon 9431698 9431808 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > > chr1 maker gene 8894975 9021577 . + . > ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 > chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 > chr1 maker exon 8894975 8895153 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > chr1 maker exon 8942215 8942531 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 22:10:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 21:10:27 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you try version 2.31 (the current version)? I believe this is happening because you are passing in MAKER genes as pred_gff the transcripts thus ended up with the same Names and IDs as the genes being generated by the MAKER run via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in 2.31 (IDs and names are generated slightly differently in 2.30+). Thanks, Carson From: Walter Eckalbar Date: Tuesday, February 25, 2014 at 7:11 PM To: Daniel Ence Cc: "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Daniel, those have been uploaded and I?m using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, the >> problem is that there are non-unique IDs for a number of the annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4. >> 16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1 >> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82 >> |1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak >> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0 >> .88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Feb 26 02:26:35 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 26 Feb 2014 08:26:35 +0000 Subject: [maker-devel] Functional annotation options Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? 2) maker_functional_gff This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 3) maker_functional This just throws an error about a missing Job ID, so no clue what this would be used for. I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. With kind regards, Marc Hoeppner Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From mikael.durling at slu.se Wed Feb 26 03:43:43 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:43:43 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> 26 feb 2014 kl. 09:26 skrev Marc H?ppner : > Dear List, > > I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: > > 1) iprscan > It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. > 2) maker_functional_gff > This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. It works fine with ncbiblast+ and the blastp command with -outfmt 6. cheers, Mikael Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations. > > 3) maker_functional > This just throws an error about a missing Job ID, so no clue what this would be used for. > > I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. > > With kind regards, > > Marc Hoeppner > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Wed Feb 26 03:55:56 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:55:56 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se> 26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling >: 26 feb 2014 kl. 09:26 skrev Marc H?ppner >: Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however. Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 06:30:44 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 12:30:44 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 07:22:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 06:22:34 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: > > Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >> >> maker_opts.ctl >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> Thanks, >> Shaun >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 07:37:29 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 13:37:29 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nextgen.usfs at gmail.com Wed Feb 26 10:21:33 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:21:33 -0600 Subject: [maker-devel] change program locations in maker_exe Message-ID: Hello, I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information. Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file? Thank you. - Jon From carsonhh at gmail.com Wed Feb 26 09:38:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 08:38:47 -0700 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: maker_functional is a script that gets called by another script, not meant to be called directly by the user. So ignore that. Just run iprscan directly it already works pretty well. The mpi_iprscan and iprscan_wrap scripts, just give some logging functionality by wrapping the iprscan call. In most cases there is not advantage over just running iprscan directly. ?Carson On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" wrote: > >26 feb 2014 kl. 09:26 skrev Marc H?ppner : > >> Dear List, >> >> I have finished a gene build now, and I would like to go over to >>functional annotation. I understand that maker includes a few script to >>facilitate such analyses. However, I have a few questions about this: >> >> 1) iprscan >> It seems maker includes a MPI wrapper for InterProscan, but requests >>?iprscan? to be in $PATH. The latest versions of Interproscan I have >>worked with are java applications and eventho I put their location in >>$PATH, mpi_iprscan seems to want something else? But what? > >I don?t believe it works with interproscan5. What I usually do is to >split the maker protein file into chunks, and then run these chunks as >separate jobs on our cluster, then finally merge the results. The TSV >file form iprscan5 can be input into the maker tool ipr_update_gff. I >have not tried the iprscan2gff3, as I haven?t figured how to get an >iprscan4 raw file from iprscan5. > > >> 2) maker_functional_gff >> This script seems to be very useful, but the description suggests that >>it requires WuBlast tabular output ?2', which I think looks quite >>different from the ncbi blast tabular output. Since Wublast is not >>really available anymore (except this very old, frozen binary bundle), I >>was wondering how to address this issue. > >It works fine with ncbiblast+ and the blastp command with -outfmt 6. > >cheers, >Mikael > >Ps. Your welcome to visit me at SLU if you would like to discuss >experiences of genome annotations. > > >> >> 3) maker_functional >> This just throws an error about a missing Job ID, so no clue what this >>would be used for. >> >> I guess what I am after is some suggestion as to how use the scripts >>included with Maker to achieve a reasonable functional annotation. >> >> With kind regards, >> >> Marc Hoeppner >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 10:09:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 09:09:14 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: > >> Can this use of maker_coor be used only to hint about the placement of the >> ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, can >> this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >>> There is a way. It?s not a standard option and it?s undocumented, but if >>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that. >>> The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add tags >>> to your fasta headers, those can be used to guide the mapping and naming. >>> For example, gene_id= will ensure different isoforms that share >>> a common gene_id get clustered into the same gene, and >>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence >>> to only be mapped against chr1 within the range of 1-10000 bp and just >>> using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using blast >>> alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, using >>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate >>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is >>> it possible to map the names of the genes from the related species to my >>> annotation? I see the map_forward option, which applies to the model_gff >>> parameter. Is there a similar option for est and protein? >>> >>> maker_opts.ctl >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 26 10:38:37 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 26 Feb 2014 16:38:37 +0000 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: MAKER first looks inside of .../maker/exe/ for any executables. Then it uses the systems ?which? command to identify executables in your PATH environmental variable. If MAKER is not finding the one you want, then you can either put the program in the .../maker/exe/ folder (I.e. create .../maker/exe/bin/ and then put soft links to the executables you want to be used first), or you can rearrange the order of paraameters in your PATH environmental variable so that ?which ? returns the location you want. If MAKER is always leaving the locations to those programs empty, it is because you need to add them to your PATH environmental variable. Thanks, Carson On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: >Hello, >I was wondering if there is a way to make permanent changes to the >maker_exe.ctl file, as it seems on the install that maker didn?t find the >gene mark or pro build locations correctly, which means that I have to >manually edit the maker_exe.ctl file every time and add that information. > Where can I modify this permanently so that the maker -CTL command >creates the appropriate maker_exe file? Thank you. > >- Jon > > From nextgen.usfs at gmail.com Wed Feb 26 10:58:11 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:58:11 -0600 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com> Hi Carson, Thank you - that did it, I didn?t have them in the PATH. All working now. Cheers, Jon On Feb 26, 2014, at 10:38 AM, Carson Holt wrote: > MAKER first looks inside of .../maker/exe/ for any executables. Then it > uses the systems ?which? command to identify executables in your PATH > environmental variable. If MAKER is not finding the one you want, then > you can either put the program in the .../maker/exe/ folder (I.e. create > .../maker/exe/bin/ and then put soft links to the executables you want to > be used first), or you can rearrange the order of paraameters in your PATH > environmental variable so that ?which ? returns the location > you want. If MAKER is always leaving the locations to those programs > empty, it is because you need to add them to your PATH environmental > variable. > > Thanks, > Carson > > On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: > >> Hello, >> I was wondering if there is a way to make permanent changes to the >> maker_exe.ctl file, as it seems on the install that maker didn?t find the >> gene mark or pro build locations correctly, which means that I have to >> manually edit the maker_exe.ctl file every time and add that information. >> Where can I modify this permanently so that the maker -CTL command >> creates the appropriate maker_exe file? Thank you. >> >> - Jon >> >> > From weckalba at asu.edu Wed Feb 26 14:05:05 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Wed, 26 Feb 2014 12:05:05 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA's gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is > happening because you are passing in MAKER genes as pred_gff the > transcripts thus ended up with the same Names and IDs as the genes being > generated by the MAKER run via SNAP etc. This shouldn't happen with > model_gff, and shouldn't happen in 2.31 (IDs and names are generated > slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I'm using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: > >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, >> the problem is that there are non-unique IDs for a number of the >> annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 15:12:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 14:12:23 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you put the file in this GFF3 validator to see if anything comes up? ?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Maybe it?s just PASA. But I?d like to know there?s no issue being caused by something else. Thanks, Carson From: Walter Eckalbar Date: Wednesday, February 26, 2014 at 1:05 PM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA?s gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is happening > because you are passing in MAKER genes as pred_gff the transcripts thus ended > up with the same Names and IDs as the genes being generated by the MAKER run > via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in > 2.31 (IDs and names are generated slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I?m using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >>> Hi all, >>> >>> I am trying to update maker annotations with PASA and encountered errors >>> stemming from file format issues in the gff3 file. >>> >>> I put a few lines from the gff3 to highlight the issue below. Basically, >>> the problem is that there are non-unique IDs for a number of the >>> annotations. >>> >>> Is there anything that can be done to right this problem? >>> >>> Thanks, >>> >>> Walter >>> >>> Lines from GFF3 file, repeated IDs are highlighted: >>> >>> >>> chr1 maker gene 9377440 9432028 . - . >>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4 >>> .16 >>> chr1 maker mRNA 9377440 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4. >>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0. >>> 82|1|1|1|28|1680|1234 >>> chr1 maker exon 9431899 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> chr1 maker exon 9431698 9431808 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> >>> chr1 maker gene 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >>> chr1 maker mRNA 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma >>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84 >>> |0.88|27|503|2007 >>> chr1 maker exon 8894975 8895153 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> chr1 maker exon 8942215 8942531 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 16:04:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 22:04:37 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt >: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 16:50:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 15:50:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : > It will still work without est_forward. It just works a little differently. > Keep in mind this was a hidden feature I used to find stubborn or hard to find > missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and then > polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for > a match using all of chr1 as the input to exonerate even when BLAST finds > nothing (this is a very very slow search, but can help pick up one or two > stubborn genes that don?t remap well). To allow this, MAKER gives exonerate > looser matching parameters (i.e. allows for single base pair introns perhaps > caused by assembly errors). The logic here is that given the fact that I > already told MAKER that with some degree of confidence I expect sequence A to > map to to location X, it will try its hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at line > 1563, but only after a BLAST alignment has already seeded it to the region > (that BLAST result has the information in its description parameter). MAKER > will then ignore seeds completely outside of maker_coor. In addition any BLAST > seeds that overlap maker_coor will get the search space for alignment > polishing adjusted to match maker_coor exactly. Also match parameters for > exonerate will not be relaxed as they were with est_forward. > > As you can see the behavior, is slightly different (because it?s an accidental > feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading the > code, it seems that I need to supply maker_coor but not gene_id, as well as > the configuration option est_forward for this to work. Any occurrences of > maker_coor in GI.pm seems to be conditioned on set_forward=1 right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >> wrote: >> >>> Can this use of maker_coor be used only to hint about the placement of the >>> ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>>> There is a way. It?s not a standard option and it?s undocumented, but if >>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add tags >>>> to your fasta headers, those can be used to guide the mapping and naming. >>>> For example, gene_id= will ensure different isoforms that share >>>> a common gene_id get clustered into the same gene, and >>>> maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using blast >>>> alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, using >>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the map_forward option, which applies to the >>>> model_gff parameter. Is there a similar option for est and protein? >>>> >>>> maker_opts.ctl >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 17:45:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 16:45:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson Sent from my iPhone > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. > > THanks, > Mikael > >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >> >> As you can see the behavior, is slightly different (because it?s an accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >> >> Mikael >> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>> >>>>> maker_opts.ctl >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> Thanks, >>>>> Shaun >>>>> >>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 10:46:44 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 11:46:44 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: maker_error_openfabrics.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Feb 27 12:09:21 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 27 Feb 2014 18:09:21 +0000 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib Example : mpiexec -mca btl ^openib -n 20 maker Thanks, Carson From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM To: > Subject: Problem with OpenFabrics and infiniband Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 12:55:34 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 13:55:34 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com> Hi Carson, Thanks that fixed the issue. Cheers Ian On Feb 27, 2014, at 1:09 PM, Carson Holt wrote: > It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). > > For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib > > Example : >> mpiexec -mca btl ^openib -n 20 maker > > > Thanks, > Carson > > > From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM > To: > Subject: Problem with OpenFabrics and infiniband > > Hello, > > I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. > > IT suggestions > > If you look at the top of the error log for the problematic job, it clearly > warns of an issue with doing 'fork's within openmpi/openfabrics framework. > > In particular, the use of the fork system call is only partially supported > in the OpenFabrics software (this is the drivers, etc for the infiniband > connections). See e.g. > http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork > for more information. In particular the paragraphs starting with the > sentence with the red highlighted "it does not mean that your fork()-calling > application is safe". (The kernel, openMPI version, and OFED version are > sufficiently recent to mean that there is _some_ fork support). > > The fact that the job runs over gigE but not IB, in conjunction with the > warning from openmpi, strongly suggests that this is the issue that you are > encountering. I suspect that maker touches registered memory before the fork, > which would result in a segfault (matching what was observed). > > You can try adding the arguments > --mca mpi_warn_on_fork 0 > to the mpirun command, just in case the crash was somehow caused by openmpi's > warning, but I would not hold out much hope for that. > > ###UPDATE### This does not fix the problem. > > > Basically, it looks like maker uses some system calls like fork in a manner > which is incompatible with the current OpenFabrics software, and thus will > not work with infiniband. This situation is likely to remain until either > maker changes to be compatible with OFED, or OFED's support for the fork > system call is broadened. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 17:17:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 15:17:22 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 18:27:30 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 16:27:30 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: > Is there a corresponding protein_forward=1 option to map forward protein > names from protein2genome? > > Cheers, > Shaun > > On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) > wrote: > > Sorry I meant to say prefilter on the score in the mRNA column before > passing the gff3 to model_gff. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and > est2genome/protein2genome set to 1. Then take those results, pass them in > as model_gff and use the map_forward option to then filter the results > based on mRNA score and that would copy names onto new gene under the > standard MAKER pipeline. Eventually it?s really supposed to go into a > separate tool that will map genes onto new assemblies (but under the hood > the tool will just be calling MAKER with certain parameters restricted). I > do this because if people commonly use it mixed with things like SNAP I can > start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where > you have firm a priori knowledge of the placement of ESTs. However, while > trying it I note that est_forward implies that the est2genome predictor is > turned on, implicitly. Is this necessary for this to work? I?m after the > behavior you describe below where exonerate is made to try really hard > within a limited region to align an est, but I would not like maker to > produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that > is worthy to be promoted into a documented feature. > > THanks, > Mikael > > 26 feb 2014 kl. 17:09 skrev Carson Holt : > > It will still work without est_forward. It just works a little > differently. Keep in mind this was a hidden feature I used to find > stubborn or hard to find missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and > then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to > look for a match using all of chr1 as the input to exonerate even when > BLAST finds nothing (this is a very very slow search, but can help pick up > one or two stubborn genes that don?t remap well). To allow this, MAKER > gives exonerate looser matching parameters (i.e. allows for single base > pair introns perhaps caused by assembly errors). The logic here is that > given the fact that I already told MAKER that with some degree of > confidence I expect sequence A to map to to location X, it will try its > hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at > line 1563, but only after a BLAST alignment has already seeded it to the > region (that BLAST result has the information in its description > parameter). MAKER will then ignore seeds completely outside of maker_coor. > In addition any BLAST seeds that overlap maker_coor will get the search > space for alignment polishing adjusted to match maker_coor exactly. Also > match parameters for exonerate will not be relaxed as they were with > est_forward. > > As you can see the behavior, is slightly different (because it?s an > accidental feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading > the code, it seems that I need to supply maker_coor but not gene_id, as > well as the configuration option est_forward for this to work. Any > occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 > right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < > mikael.durling at slu.se> wrote: > > Can this use of maker_coor be used only to hint about the placement of the > ests, without affecting the naming of the final genes? Ie if I have a > database of EST where I have a priori knowledge of their rough placement, > can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > > 26 feb 2014 kl. 01:58 skrev Carson Holt : > > There is a way. It?s not a standard option and it?s undocumented, but > if you add est_forward=1 to the maker_opts.ctl file, then it will do just > that. The option won?t already be there so you?ll have to type it in. > > There is also a feature designed to work with this option. If you add > tags to your fasta headers, those can be used to guide the mapping and > naming. For example, gene_id= will ensure different isoforms > that share a common gene_id get clustered into the same gene, > and maker_coor=chr1:1-10000 in the fasta header will force a particular > sequence to only be mapped against chr1 within the range of 1-10000 bp and > just using maker_coor=chr1 will force it to only be mapped against chr1. > > This is an undocumented way to remap genes onto new assemblies using blast > alignments of earlier transcript or protein annotations as a guide. > > ?Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM > To: > Subject: [maker-devel] Mapping gene names > > Hi, > > I?m annotating a genome using a closely related genome from Genbank, using > the .frn (RNA) and .faa (protein) files from Genbank as evidence to > annotate my genome. I?ve run Maker, and the annotation seems to have worked > well. Is it possible to map the names of the genes from the related species > to my annotation? I see the *map_forward* option, which applies to the > *model_gff* parameter. Is there a similar option for *est* and *protein*? > > *maker_opts.ctl* > > est=NC_123456.frn > protein=NC_123456.faa > est2genome=1 > protein2genome=1 > > Thanks, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 27 19:13:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Feb 2014 18:13:06 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson Sent from my iPhone > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > Cheers, > Shaun > > > >> On 27 February 2014 15:17, Shaun Jackman wrote: >> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >> >> Cheers, >> Shaun >> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>> Yes. That should work as well as an accidental feature. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>> >>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>> >>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>> >>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Shaun Jackman >>>>>>>> Reply-To: Shaun Jackman >>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>> To: >>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>> >>>>>>>> maker_opts.ctl >>>>>>>> >>>>>>>> est=NC_123456.frn >>>>>>>> protein=NC_123456.faa >>>>>>>> est2genome=1 >>>>>>>> protein2genome=1 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Shaun >>>>>>>> >>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Fri Feb 28 04:40:30 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Fri, 28 Feb 2014 10:40:30 +0000 Subject: [maker-devel] maker_coor behaviour Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se> Hi, in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled. I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it. Thanks, Mikael From carsonhh at gmail.com Fri Feb 28 08:09:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 07:09:09 -0700 Subject: [maker-devel] maker_coor behaviour Message-ID: I wouldn?t use those options for standard de novo annotation. There are really other more appropriate thing that should be used instead. Both maker_coor and est_forward are destined to be part of a separate tool that will secretly just be calling MAKER, but will allow me to control what other parameters MAKER sees to avoid certain logic incompatibilities that make sense when mapping entire genes onto a new assembly, but not really for de novo annotation using ESTs. You should instead try modifying these options in the maker_bopts.ctl file ?> pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn= #Blastn eval cutoff bit_blastn= #Blastn bit cutoff depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming high evidence overlap regions en_score_limit= #Exonerate nucleotide percent of maximal score threshold If either blastn or est2genome results disappear, it is because they don?t meet one of these thresholds (blastn results that don?t meet the thresholds but are borderline are kept if exonerate does meet the thresholds, but if exonerate misses a threshold they will be thrown out). That is whey the EST in question gets thrown out and it?s why the blastn result disappears when you try and anchor it with maker_coor. You can visualize everything with a browser when your done. I still recommend the old version of Apollo for this (it?s just easier). You can try and install it using the ?./Build apollo? option from the .../maker/src/ directory, and it will be installed in .../maker/exe/apollo. It requires that you have apache ant installed to do this. Otherwise just download it from the GMOD source forge page and install it manually. Thanks, Carson On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" wrote: >Hi, > >in a previous thread, the maker_coor feature for ETSs was mentioned. I >have been trying it out, without using it for mapping gene names. I have >placed these ESTs by other means, an thought the maker_coor feature would >be a good use of this a priori knowledge. My major problem i try to solve >is that I find that some ESTs where I know where they should be aligned, >are not recruited to that position by maker?s blastn->exonerate method (I >find them on other scaffolds). So I thought maker_coor with the >est_forward behavior (as described) would be a good option to force my >evidence onto the correct position, instead of ending up supporting or >braking other models. However, as soon as I run with maker_coor tagged >est sequences, no est2genome evidence appears in the final gff3 file. The >blastn evidence is there when est_forward is disabled, but as expected, >there is no blastn evidence when est_forward is turned on. It seems >though as the evidence is used, as the QI lines indicate EST support for >both splice sites as well as exon alignments, but I have no way to >visualize and/or evaluate the congruence of evidence and models. Would it >be possible to tweak Maker into outputting the est2genome alignments when >est_forward/maker_coor is used? I couldn?t figure myself where in the >code this was handled. > >I could of course do my own exonerate alignments of these ESTs and feed >them into maker as est_gff, but if maker already has the machinery to to >this, I thought it would be a good idea to use it. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rbharris at uw.edu Fri Feb 28 14:14:55 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Fri, 28 Feb 2014 12:14:55 -0800 Subject: [maker-devel] error in snap training In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > > > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > > > Hey - > > > > I'm trying to train SNAP and am running into errors. I don't have any > EST evidence, just protein. My .gff file reports 10865 genes but when I run > maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, > a ton of overlap_prev_exon errors get written to the screen and then with I > get to the forge step I get an "impossible error5". Any help would be > greatly appreciated. > > > > Thanks! > > Rebecca > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 28 14:22:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 13:22:12 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: If it?s failing both ways I?m thinking this may be SNAP itself. Try these two different versions of SNAP. ?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz and ?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz If they both fail then contact the SNAP development group ?> korflab AT ucdavis DOT edu Thanks, Carson From: Rebecca Harris Date: Friday, February 28, 2014 at 1:14 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] error in snap training Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > >> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: >> > >> > Hey - >> > >> > I'm trying to train SNAP and am running into errors. I don't have any EST >> evidence, just protein. My .gff file reports 10865 genes but when I run >> maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a >> ton of overlap_prev_exon errors get written to the screen and then with I get >> to the forge step I get an "impossible error5". Any help would be greatly >> appreciated. >> > >> > Thanks! >> > Rebecca >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Mon Feb 3 09:31:16 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 3 Feb 2014 10:31:16 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand > what's happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? > Also, you had 29,000 transcripts from cufflinks, but fewer from > MAKER when you gave it the cufflinks data. How many transcripts did > MAKER identify with the cufflinks data? Did you still get more than > the 10,000 transcripts that you found with just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be > affecting it's performance is that it only annotates a gene where > there is both evidence (like your RNA-seq data) and an ab-initio > prediction. If a prediction is unsupported by the evidence, then > MAKER won't annotate a gene and if evidence aligns where there's no > prediction, MAKER won't annotate a gene either. What ab-initio > predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to > a gene model by setting the est2genome option to 1, but that will > usually give you many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data > and with est2genome set to 1, and let us know how that affects the > MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > dhivya arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rebzi87 at gmail.com Tue Feb 4 15:29:41 2014 From: rebzi87 at gmail.com (Rebecca Harris) Date: Tue, 4 Feb 2014 14:29:41 -0800 Subject: [maker-devel] maker output Message-ID: Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Feb 4 15:43:19 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 4 Feb 2014 16:43:19 -0600 Subject: [maker-devel] Fwd: maker annotation with cufflinks output References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Resending this since it didnt make it to the mailing list before. > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 4 15:42:52 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 4 Feb 2014 22:42:52 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished. If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com] Sent: Tuesday, February 04, 2014 3:29 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Tue Feb 4 15:49:46 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Tue, 4 Feb 2014 22:49:46 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: > 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" : > > Hi, > > I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. > These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though Mikael > > Thanks! > > Cheers, > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Feb 4 15:50:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Feb 2014 15:50:10 -0700 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Clusters are notoriously flakey, so maker is restartable (hence the need for the log file). Also since multiple nodes may write simultaneously to the log, they can munge it?s contents. You can rerun maker with the -dsindex flag to regenerate the master_datastore_index.log as well without processing anything else. You can even delete it before rebuilding it if you want to ensure all entries are uniq (run on a single cpus when you do this). Then count the number of FINISHED entries in the log. Thanks, Carson From: Rebecca Harris Date: Tuesday, February 4, 2014 at 3:29 PM To: Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 11:38:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 11:38:50 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn ap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand what's > happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? Also, you had > 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the > cufflinks data. How many transcripts did MAKER identify with the cufflinks > data? Did you still get more than the 10,000 transcripts that you found with > just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be affecting > it's performance is that it only annotates a gene where there is both evidence > (like your RNA-seq data) and an ab-initio prediction. If a prediction is > unsupported by the evidence, then MAKER won't annotate a gene and if evidence > aligns where there's no prediction, MAKER won't annotate a gene either. What > ab-initio predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to a gene > model by setting the est2genome option to 1, but that will usually give you > many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data and with > est2genome set to 1, and let us know how that affects the MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya > arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 12:28:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 19:28:48 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 13:13:57 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 14:13:57 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid > sequences from related organisms and some kind of omnibus source > like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 13:36:26 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 20:36:26 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, , <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Hi Dhivya, In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct. It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data. Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts. The problem that Carson mentioned with the SNAP HMM file is a real possibility also. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: dhivya arasappan [darasappan at gmail.com] Sent: Wednesday, February 05, 2014 1:13 PM To: Daniel Ence Cc: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 13:38:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 13:38:44 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Protein data doesn?t have to be from that closely a related species. This is because genes maintain homology at the amino acid level across even very large evolutionary distances. Having a closer related species just ensures that genome contents are similar (fewer losses/gains relative to each other). And use the entire proteome of at least one related species (just using a database like swiss-prot is not sufficient). Using translated mRNA-seq data will not give you any new information that was not already available from the untranslated sequence. Plus it will introduce the complicating artifacts that mRNA-seq generates into the protein part of the pipeline (gene merging, incorrect assembly, and false calls caused by background transcription). A big gotcha with mRNA-seq is that all of your genome gets transcribed at a low level, not just the genes, so you will always have contamination that does not represent real gene models. Also in the end you really only expect to capture about 50% of the genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will be partial). So using the proteins from another species, is important to improve sensitivity, and fix many of the issues that arise from the noisy nature of mRNA-seq. In fact if you were forced to use only one (either protein evidence or mRNA-seq) you will actually get better annotations from the protein evidence in most cases. You get better annotations when you use both, but if using only one of them, the proteins from another species are better, and noisy mRNA-seq will be the primary source of annotation error. Thanks, Carson From: dhivya arasappan Date: Wednesday, February 5, 2014 at 1:13 PM To: Daniel Ence Cc: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid sequences > from related organisms and some kind of omnibus source like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 22:16:43 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 23:16:43 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com> Thank you both for those explanations. I'll get back to you after I try rerunning maker. Dhivya On Feb 5, 2014, at 2:38 PM, Carson Holt wrote: > Protein data doesn?t have to be from that closely a related > species. This is because genes maintain homology at the amino acid > level across even very large evolutionary distances. Having a > closer related species just ensures that genome contents are similar > (fewer losses/gains relative to each other). And use the entire > proteome of at least one related species (just using a database like > swiss-prot is not sufficient). > > Using translated mRNA-seq data will not give you any new information > that was not already available from the untranslated sequence. Plus > it will introduce the complicating artifacts that mRNA-seq generates > into the protein part of the pipeline (gene merging, incorrect > assembly, and false calls caused by background transcription). A > big gotcha with mRNA-seq is that all of your genome gets transcribed > at a low level, not just the genes, so you will always have > contamination that does not represent real gene models. Also in the > end you really only expect to capture about 50% of the genes with > mRNA-seq (maybe 70% if you are fortunate - and most of those will be > partial). So using the proteins from another species, is important > to improve sensitivity, and fix many of the issues that arise from > the noisy nature of mRNA-seq. In fact if you were forced to use > only one (either protein evidence or mRNA-seq) you will actually get > better annotations from the protein evidence in most cases. You get > better annotations when you use both, but if using only one of them, > the proteins from another species are better, and noisy mRNA-seq > will be the primary source of annotation error. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Wednesday, February 5, 2014 at 1:13 PM > To: Daniel Ence > Cc: Carson Holt , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello Daniel and Carson, > > Thanks for your replies. > > Yes I used the the protein sequences resulting from annotation of > trinity assembly (using trinotate). I'll try using protein > sequences from related species (though there arent sequences from > closely related orgs). Could you tell me a little about why protein > data from annotating my rnaseq data would not work best here? > > Thanks > Dhivya > > On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > >> Hi Dhivya, Are the protein matches in your results coming from your >> annotations of the transcriptome? You should really use amino-acid >> sequences from related organisms and some kind of omnibus source >> like SwissProt. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> From: Carson Holt [carsonhh at gmail.com] >> Sent: Wednesday, February 05, 2014 11:38 AM >> To: dhivya arasappan; Daniel Ence >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Thu Feb 6 04:02:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Thu, 6 Feb 2014 11:02:37 +0000 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: Hi Carson, it?s nice to see all these new features in maker. I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message: ERROR: You found a tRNA with an intron! This should not happen --> rank=12, hostname=my-mgrid6 ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scf_013 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf_013 I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me: scf_013 1 189339 189410 Thr AGT 0 0 82.83 scf_013 2 510381 510462 Ser AGA 0 0 67.09 scf_013 3 586886 587000 Leu CAA 586924 586956 57.97 scf_013 4 942166 942069 Leu AAG 942128 942113 57.48 scf_013 5 169102 168993 Leu TAA 169065 169037 56.49 Hope this can be of some help while debugging. I?ll leave trnascan off for now. thanks, Mikael 10 jan 2014 kl. 22:03 skrev Carson Holt : > Hi Mikael, > > The options are part of the new MAKER-P integration > (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac > t). Additional documentation/tutorials will be forthcoming - probably in > a nice wiki page as part of the upcoming GMOD Malaysia courses in February > or alternatively with the annual GMOD summer school. The tRNA option is > easy enough to turn on (just set trna=1 in the maker_opts.ctl file). > > Thanks, > Carson > > > > On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and other maker developers, >> >> I was reading the source code of the latest maker release and noted >> several references to ncRNAs, snoscan and trnascan. Can these be >> incorporated into the normal annotation workflow? If so, are there any >> instructions available for that? >> >> best regards, >> Mikael Durling >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Thu Feb 6 07:52:12 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 08:52:12 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com> Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? Snippet of my genome.ann file is attached (since its too big for the list) for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.ann Type: application/octet-stream Size: 15761 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.dna Type: application/octet-stream Size: 3075 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 09:01:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:01:04 -0700 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: I?m making a new release this weekend, but if you have access to the devel version, you can test now. All changes have been committed tot he subversion repository. Thanks, Carson On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson, > >it?s nice to see all these new features in maker. > >I gave the trnascan option a try by enabling it in the config file for >one of my fungal genomes. It failed though, with this error message: > >ERROR: You found a tRNA with an intron! This should not happen >--> rank=12, hostname=my-mgrid6 >ERROR: Failed while gathering ab-init output files >ERROR: Chunk failed at level:1, tier_type:2 >FAILED CONTIG:scf_013 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf_013 > >I checked the trnascan output >(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, >and the output seems valid to me: > >scf_013 1 189339 189410 Thr AGT 0 0 >82.83 >scf_013 2 510381 510462 Ser AGA 0 0 >67.09 >scf_013 3 586886 587000 Leu CAA 586924 586956 >57.97 >scf_013 4 942166 942069 Leu AAG 942128 942113 >57.48 >scf_013 5 169102 168993 Leu TAA 169065 169037 >56.49 > > >Hope this can be of some help while debugging. I?ll leave trnascan off >for now. > >thanks, > >Mikael > > >10 jan 2014 kl. 22:03 skrev Carson Holt : > >> Hi Mikael, >> >> The options are part of the new MAKER-P integration >> >>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr >>ac >> t). Additional documentation/tutorials will be forthcoming - probably >>in >> a nice wiki page as part of the upcoming GMOD Malaysia courses in >>February >> or alternatively with the annual GMOD summer school. The tRNA option is >> easy enough to turn on (just set trna=1 in the maker_opts.ctl file). >> >> Thanks, >> Carson >> >> >> >> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and other maker developers, >>> >>> I was reading the source code of the latest maker release and noted >>> several references to ncRNAs, snoscan and trnascan. Can these be >>> incorporated into the normal annotation workflow? If so, are there any >>> instructions available for that? >>> >>> best regards, >>> Mikael Durling >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Thu Feb 6 09:05:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:05:05 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Your genome.dna file has no sequence? Did you by any chance strip the fasta sequence from the GFF3 you are using as input to maker2zff? There should be fasta sequence at the end of that file. Also can I see the GFF3 file you are using as input to maker2zff. Thanks, Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 7:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? My genome.ann file is attached for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 10:04:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 10:04:25 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: Could you give me the file without using 'head? to trim it, its cutting it before it reaches the part I?m interested in. ?Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 10:01 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip the fasta > sequence from the GFF3 you are using as input to maker2zff? There should be > fasta sequence at the end of that file. Also can I see the GFF3 file you are > using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data in it. > However, the SNAP steps after that have created empty files. The following > are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors like this > for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds > exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds > exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds > exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds > exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds > exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds > exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds > exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all showing up as > errors. I realize this may be an issue with snap, but are you familiar with > anything like this? My genome.ann file is attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? We?ve had >> a couple of recent posts where after training snap was giving no results, and >> as a result maker couldn?t give any genes. One cause of something like that >> may be your step 2. Make sure the ZFF wasn?t empty you used to train with. >> The maker2zff script uses filters to only put the best genes in the off file, >> and if all your genes fail the filtering then you are training with an empty >> ZFF. >> >> Also you should use proteins from a related species as your protein file. I >> see that you protein marches are varying wildly from run to run? So is your >> contig count? Were the subset of contigs you have results for long enough to >> contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to >> annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks like this >> (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of which there >> are 29,000 transcripts). I used the protein sequences from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks like >> this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than expected, so I >> went the cufflinks route. I dont understand why when using the cufflinks >> transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then >> used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna >> p/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the genome was >> more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand what's >>> happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >>> the cufflinks data. How many transcripts did MAKER identify with the >>> cufflinks data? Did you still get more than the 10,000 transcripts that you >>> found with just the Trinity data? >>> >>> A key part of MAKER's approach to genome annotation that might be affecting >>> it's performance is that it only annotates a gene where there is both >>> evidence (like your RNA-seq data) and an ab-initio prediction. If a >>> prediction is unsupported by the evidence, then MAKER won't annotate a gene >>> and if evidence aligns where there's no prediction, MAKER won't annotate a >>> gene either. What ab-initio predictors are you using and have they been >>> trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments to a gene >>> model by setting the est2genome option to 1, but that will usually give you >>> many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks data and with >>> est2genome set to 1, and let us know how that affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >>> arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Feb 6 10:01:44 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 11:01:44 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip > the fasta sequence from the GFF3 you are using as input to > maker2zff? There should be fasta sequence at the end of that file. > Also can I see the GFF3 file you are using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org > " > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data > in it. However, the SNAP steps after that have created empty files. > The following are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors > like this for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds > exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds > exon-2:out_of_bounds exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds > exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds > exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds > exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds > exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds > exon-20:out_of_bounds exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all > showing up as errors. I realize this may be an issue with snap, but > are you familiar with anything like this? My genome.ann file is > attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.cat.formatted.gff Type: application/octet-stream Size: 19905 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 6 17:22:57 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Feb 2014 16:22:57 -0800 Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation Message-ID: Hi MAKER developers, I?d like to add MAKER to Homebrew to make the installation of MAKER and its dependencies as straight forward as brew install maker. Homebrew is a system for installing software, originally developed for Mac OS, and now also for Linux through Linuxbrew. Homebrew/science is a collection of scientific software, which includes a lot of bioinformatics software. I?ve created a prototype for the MAKER installation script(called a formula, in Homebrew parlance). Is there a static URL for the source code of MAKER? The current formula won?t work out of the box, because part of the URLdepends on the user?s unique ID: http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz. Would you be interested in adding MAKER to Homebrew? I know MAKER must be licensed for commercial use. It is possible for Homebrew to display a notice of the MAKER license when it?s installed. MAKER is not available for commercial use without a license. Those wishing to license MAKER for commercial use should contact Beth Drees at the University of Utah TCO to discuss your needs. Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Feb 7 06:29:27 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 7 Feb 2014 08:29:27 -0500 Subject: [maker-devel] NCBI feature table Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Hello Maker Developers, I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. Cheers Ian From mike.thon at gmail.com Fri Feb 7 07:14:06 2014 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 7 Feb 2014 15:14:06 +0100 Subject: [maker-devel] NCBI feature table In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Hi Ian - We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format. However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits. From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess. The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app. In that case you'll need to convert the output of that app into .tbl format. > On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics wrote: > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cexzurjimenezjr at gmail.com Thu Feb 6 22:27:13 2014 From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.) Date: Fri, 7 Feb 2014 13:27:13 +0800 Subject: [maker-devel] Testing MAKER After Installation Message-ID: Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1501 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1319 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4540 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:11:44 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:11:44 +0000 Subject: [maker-devel] Maker installation In-Reply-To: References: Message-ID: Hi Tracy, The older apollo is pretty much deprecated. There are still people who like to use it though (myself among them). You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/. If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz You can also use Web-Apollo for your data if you want, and that is what I would recommend. On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that. The GUI is really only for demonstration purposes or very small datasets. It is not for production (that is why it is off by default). Thanks, Carson From: Tracy Smith > Date: Friday, February 7, 2014 at 10:48 AM To: Carson Holt > Cc: > Subject: Maker installation Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip" but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:28:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:28:29 +0000 Subject: [maker-devel] NCBI feature table In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Message-ID: Yes. The non-web version of apollo can open GFF3 and then save to table format ?> http://sourceforge.net/projects/gmod/files/Apollo/ I?ve also attached a script made by a lab member that can convert MAKER derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts author (Michael Campbell) incase you have any questions. Thanks, Carson On 2/7/14, 7:14 AM, "Michael Thon" wrote: >Hi Ian - > >We've been struggling with this too and I started developing a script to >convert the maker gff into ncbi's .tbl format. However we found that >some of the gene models required manual editing so what we do is import >the gff into a commercial application called Geneious where we do the >edits. From there we export the data in genbank format and then convert >it to .tbl format with a script. Our submission just passed the automated >checks and we're waiting for the manual review. Probably none of my code >will help you, and in any case its kind of a mess. The only advice I can >offer is to say that you'll probably need some manual editing in your >workflow, if not Apollo, then some other app. In that case you'll need >to convert the output of that app into .tbl format. > >> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics >> wrote: >> >> Hello Maker Developers, >> >> I have used this software with great success and I continue to look to >>it going forward. However, as I?m getting ready to submit my annotations >>to NCBI with the genomes I haven?t found a straightforward method of >>turning the MAKER produced GFF files into a NCBI feature table. What is >>the process for creating this table? It seem that the format NCBI is >>looking for is unique and I haven?t uncovered any scripts or tools to >>assist in the creation of this table from my annotation files. If anyone >>has any insight on this issue it would be greatly appreciated. >> >> Cheers >> Ian >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff32table Type: application/octet-stream Size: 7511 bytes Desc: gff32table URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:31:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:31:17 +0000 Subject: [maker-devel] Testing MAKER After Installation In-Reply-To: References: Message-ID: That can happen on some systems with that very old version of MAKER. Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html Thanks, Carson From: "Cexzur Jimenez Jr." > Date: Thursday, February 6, 2014 at 10:27 PM To: > Subject: [maker-devel] Testing MAKER After Installation Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhall7 at hawaii.edu Fri Feb 7 17:31:36 2014 From: bhall7 at hawaii.edu (Brian Hall) Date: Fri, 07 Feb 2014 14:31:36 -1000 Subject: [maker-devel] NCBI feature table In-Reply-To: References: Message-ID: <52F57AE8.5090002@hawaii.edu> Hi Ian, My colleagues are also working on preparing a genome for submission to the NCBI. The software we are developing for this task is still a work in progress, but you are welcome to give it a try: https://github.com/tedsta/GAG It's a console-based application and it requires Python 2.6. Its strength is in filtering and modifying large segments of the genome at once -- where Apollo is good for removing a few erroneous exons, we are dealing with lists of dozens or more. This program seeks to make such changes as painless as possible. My advice is to try the simplest gff3-to-tbl script you can find and then run tbl2asn. If it works out okay, great! If you get a massive error report, get in touch and we'll help you out if we can :) --Brian On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote: > Date: Fri, 7 Feb 2014 08:29:27 -0500 > From: UMD Bioinformatics > To: maker-devel at yandell-lab.org > Subject: [maker-devel] NCBI feature table > Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com> > Content-Type: text/plain; charset=windows-1252 > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > From tmsmith23 at wisc.edu Fri Feb 7 10:48:13 2014 From: tmsmith23 at wisc.edu (Tracy Smith) Date: Fri, 7 Feb 2014 11:48:13 -0600 Subject: [maker-devel] Maker installation Message-ID: Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip " but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 08:34:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 08:34:58 -0700 Subject: [maker-devel] MAKER presentation at PAG In-Reply-To: References: Message-ID: * * maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts following the NCBI suggested naming format. * map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER fasta files. * map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3 files, old IDs/Names are mapped to to the Alias attribute. * maker_functional_fasta - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced transcript and protein fasta files. * maker_functional_gff - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced GFF3 files in the Note attribute * ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file. This is meta data that shows up when you click on an annotation in JBrowse /GBrowse. * iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3 features representing domains. Interesting tier for GBrowse. These are visible features tracks that can be seen in JBrowse/GBrowse. Thanks, Carson From: Kevin Dorn Date: Sunday, February 9, 2014 at 9:23 PM To: Subject: MAKER presentation at PAG Hi Carson, I saw your MAKER presentation at PAG this year and have a quick question. I've used MAKER to annotate the plant genome we're working on, and am mostly done. I had to step out for a second during your talk, and when I came back, you were talking about how you can transfer meaningful annotations (getting rid of the 'ugly MAKER names' for genes). Is there an accessory script to do this? Thanks, Kevin Dorn -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitha at ccmb.res.in Mon Feb 10 00:04:37 2014 From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR) Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST) Subject: [maker-devel] Falied to create new account In-Reply-To: Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Hi, I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. The error mentioned is: Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. at /var/www/cgi-bin/mwas/maker.cgi line 21. Kindly help me through the registration asap. Thanks Amitha. From listona at science.oregonstate.edu Sat Feb 8 19:08:42 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Sat, 08 Feb 2014 18:08:42 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron From caigh02 at gmail.com Sun Feb 9 20:26:57 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Sun, 9 Feb 2014 21:26:57 -0600 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 10:25:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:25:06 -0700 Subject: [maker-devel] Falied to create new account In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Message-ID: The smtp server that sends e-mails out is just down. So when you said you forgot your login, it couldn?t e-mail you. I switched to a different server for the time being. ?Carson On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" wrote: >Hi, > >I an interested in using Maker online version, for which i tried to >create a profile using the email id 'amitha at ccmb.res.in', but >unfortunately, I did not successfully login. >I am also pasting a link of the error here, >http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. > >The error mentioned is: >Error executing run mode 'forgot_login': Can't call method "MailMsg" >without a package or object reference at >/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. > at /var/www/cgi-bin/mwas/maker.cgi line 21. > >Kindly help me through the registration asap. > >Thanks >Amitha. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 10 10:26:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:26:06 -0700 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: 1. yes. It should take NCBI BLAST+ results. 2. It has to be UniProt/Swissprot or you can modify the comments of another database to look like UniProt/Swissport 3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature (or at least it could last time I tested it - which was quite a long time ago). Thanks, Carson From: Guohong Cai Date: Sunday, February 9, 2014 at 8:26 PM To: Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 12:21:31 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:21:31 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From listona at science.oregonstate.edu Mon Feb 10 12:46:06 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Mon, 10 Feb 2014 11:46:06 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron From: Barry Moore [mailto:barry.utah at gmail.com] Sent: Monday, February 10, 2014 11:22 AM To: Aaron Liston Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-using repeat masking in SNAP training Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 12:56:26 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:56:26 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com> Yep. If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run. B On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote: > Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron > > From: Barry Moore [mailto:barry.utah at gmail.com] > Sent: Monday, February 10, 2014 11:22 AM > To: Aaron Liston > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Re-using repeat masking in SNAP training > > Hi Arron, > > If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? > > Barry > > > On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > > > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 11:37:36 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 18:37:36 +0000 Subject: [maker-devel] Falied to create new account In-Reply-To: References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> , Message-ID: Hossein, Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 11:20 AM To: Daniel Ence Subject: Re: [maker-devel] Falied to create new account Hi Daniel I running it through the local server at my work M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-11 12:16 PM, "Daniel Ence" wrote: >Hi Hossein, > >Did you encounter this error while you were running MAKER on your local >machine or through the MAKER web annotation service? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Carson Holt [carsonhh at gmail.com] >Sent: Tuesday, February 11, 2014 10:18 AM >To: Daniel Ence >Cc: Mark Yandell >Subject: FW: [maker-devel] Falied to create new account > >Hey Daniel could you download his dataset, and see if you can replicate >the error. Also check if this was an MWAS job or a local maker run (his >dataset will already be there for MWAS, you just need the job ID). > >Thanks, >Carson > >On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: > >>Hi Carson >> >> >>I encountered this error while running maker >> >>FATAL ERROR >>ERROR: Failed while processing the chunk divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00006 >> >> >> >> >> >>HB >> >> >> >> >> >> >> >>> >> > > From darasappan at gmail.com Tue Feb 11 11:48:23 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 11 Feb 2014 12:48:23 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join > the GFF3?s without stripping out the fasta sequence at the end when > training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by > the mailing list. > > Here is the whole file: > > > cat.formatted.gff.tgz > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt > wrote: >> Could you give me the file without using 'head? to trim it, its >> cutting it before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org >> " >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file >> and used that as my input. I will rerun snap with the gff file >> containing the sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to >> maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip >>> the fasta sequence from the GFF3 you are using as input to >>> maker2zff? There should be fasta sequence at the end of that >>> file. Also can I see the GFF3 file you are using as input to >>> maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org >>> " >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has >>> data in it. However, the SNAP steps after that have created empty >>> files. The following are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get >>> errors like this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds >>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds >>> exon-2:out_of_bounds exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds >>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds >>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds >>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds >>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds >>> exon-20:out_of_bounds exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all >>> showing up as errors. I realize this may be an issue with snap, >>> but are you familiar with anything like this? My genome.ann file >>> is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step >>>> 3? We?ve had a couple of recent posts where after training snap >>>> was giving no results, and as a result maker couldn?t give any >>>> genes. One cause of something like that may be your step 2. >>>> Make sure the ZFF wasn?t empty you used to train with. The >>>> maker2zff script uses filters to only put the best genes in the >>>> off file, and if all your genes fail the filtering then you are >>>> training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your >>>> protein file. I see that you protein marches are varying wildly >>>> from run to run? So is your contig count? Were the subset of >>>> contigs you have results for long enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used >>>> trinotate to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of >>>> which there are 29,000 transcripts). I used the protein >>>> sequences from trinity assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than >>>> expected, so I went the cufflinks route. I dont understand why >>>> when using the cufflinks transcripts, even less genes are being >>>> found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train >>>> SNAP. I then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >>>> maker_mpi_withAlltrinity/snap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the >>>> genome was more than 90% complete. Any suggestions would be >>>> helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand >>>>> what's happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? >>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>>>> MAKER when you gave it the cufflinks data. How many transcripts >>>>> did MAKER identify with the cufflinks data? Did you still get >>>>> more than the 10,000 transcripts that you found with just the >>>>> Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might >>>>> be affecting it's performance is that it only annotates a gene >>>>> where there is both evidence (like your RNA-seq data) and an ab- >>>>> initio prediction. If a prediction is unsupported by the >>>>> evidence, then MAKER won't annotate a gene and if evidence >>>>> aligns where there's no prediction, MAKER won't annotate a gene >>>>> either. What ab-initio predictors are you using and have they >>>>> been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments >>>>> to a gene model by setting the est2genome option to 1, but that >>>>> will usually give you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks >>>>> data and with est2genome set to 1, and let us know how that >>>>> affects the MAKER results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on >>>>> behalf of dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a >>>>> very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran >>>>> maker >>>>> using my genome assembly and the trinity results. I did not get >>>>> as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome >>>>> assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>>>> using my >>>>> genome assembly and the cufflinks result. I get much less >>>>> number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel >>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 11 11:55:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Feb 2014 11:55:38 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> Message-ID: I wouldn?t use mpi_evaluator. It is buggy and has virtually no documentation. The AED values are the best way to identify which genes are higher and lower quality. You can also run interproscan to identify protein domain content as an independent evaluation. Look at this paper here ?> http://www.biomedcentral.com/1471-2105/12/491 Figure 4 has a nice example of how AED, domain content, and gene orthology correlate to show the quality of different subsets of genes in seven ant genomes. If you choose to try mpi_evaluator it uses the -CTL option to generate empty files that you then fill in. Thanks, Carson From: dhivya arasappan Date: Tuesday, February 11, 2014 at 11:48 AM To: Carson Holt Cc: Daniel Ence , Subject: Re: [maker-devel] maker annotation with cufflinks output With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join the GFF3?s > without stripping out the fasta sequence at the end when training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by the mailing > list. > > Here is the whole file: > > > cat.formatted.gff.tgz > b> > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt wrote: >> Could you give me the file without using 'head? to trim it, its cutting it >> before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >> >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file and used >> that as my input. I will rerun snap with the gff file containing the >> sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip the fasta >>> sequence from the GFF3 you are using as input to maker2zff? There should be >>> fasta sequence at the end of that file. Also can I see the GFF3 file you >>> are using as input to maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has data in it. >>> However, the SNAP steps after that have created empty files. The following >>> are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get errors like >>> this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds >>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds >>> exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds >>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds >>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds >>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds >>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds >>> exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up >>> as errors. I realize this may be an issue with snap, but are you familiar >>> with anything like this? My genome.ann file is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step 3? We?ve >>>> had a couple of recent posts where after training snap was giving no >>>> results, and as a result maker couldn?t give any genes. One cause of >>>> something like that may be your step 2. Make sure the ZFF wasn?t empty you >>>> used to train with. The maker2zff script uses filters to only put the best >>>> genes in the off file, and if all your genes fail the filtering then you >>>> are training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your protein file. >>>> I see that you protein marches are varying wildly from run to run? So is >>>> your contig count? Were the subset of contigs you have results for long >>>> enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate >>>> to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of which >>>> there are 29,000 transcripts). I used the protein sequences from trinity >>>> assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than expected, so I >>>> went the cufflinks route. I dont understand why when using the cufflinks >>>> transcripts, even less genes are being found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I >>>> then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s >>>> nap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the genome >>>> was more than 90% complete. Any suggestions would be helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand what's >>>>> happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave >>>>> it the cufflinks data. How many transcripts did MAKER identify with the >>>>> cufflinks data? Did you still get more than the 10,000 transcripts that >>>>> you found with just the Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might be >>>>> affecting it's performance is that it only annotates a gene where there is >>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a >>>>> prediction is unsupported by the evidence, then MAKER won't annotate a >>>>> gene and if evidence aligns where there's no prediction, MAKER won't >>>>> annotate a gene either. What ab-initio predictors are you using and have >>>>> they been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments to a gene >>>>> model by setting the est2genome option to 1, but that will usually give >>>>> you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks data and >>>>> with est2genome set to 1, and let us know how that affects the MAKER >>>>> results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>>>> using my genome assembly and the trinity results. I did not get as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>>>> genome assembly and the cufflinks result. I get much less number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 13:52:05 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 20:52:05 +0000 Subject: [maker-devel] New MAKER release Message-ID: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 14:19:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 21:19:17 +0000 Subject: [maker-devel] New MAKER release In-Reply-To: References: Message-ID: URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL. Thanks, Carson From: Joanna Kelley > Date: Tuesday, February 11, 2014 at 2:00 PM To: Carson Holt > Subject: Re: [maker-devel] New MAKER release Hi Carson, The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that? Thanks, Joanna Downloading RepBase... --2014-02-11 12:59:38-- http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz Resolving www.girinst.org... 66.201.49.247 Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2014-02-11 12:59:38 ERROR 404: Not Found. ERROR: Failed installing RepBase, now cleaning installation path... You may need to install RepBase manually. On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt > wrote: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Please update your address book, my new email address is joanna.l.kelley at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 15:59:57 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 22:59:57 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: Message-ID: Hi Hossen, I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 3:51 PM To: Daniel Ence Subject: ERROR: Failed while processing the chunk divide!! Dear Daniel I re-started maker and it is still running. But in error our file that has been generated so far it seems that smaller conitgs are affected. There are contigs of 2-4 kb with this error but also I noticed a contig of 30kb length having this error I was wondering if I need to change the setting in the maker_opt file #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) If I understand correctly max_dna_len divide conitgs of over 100kb to smaller chucks. However it is not clear to me that for the min_contig option if the default contig length is 10kb or less, then why I have error message for 30kb long contigs. Should I change this to 0 Here is an example of the error message for one of the contigs #--------- command -------------# Widget::exonerate::est2genome: /usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 /17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta -t /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136. fasta -Q dna -T dna --model est2genome --minintron 20 --showcigar --percent 20 > /raid01/projects/Plasmodiophora/brassica e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S c00001.235-1136.comp14545_c0_seq1.est_exonerate #-------------------------------# cleaning blastn... cleaning tblastx... cleaning blastx... ERROR: Failed on PbPT3Sc00001_S_0.8_1-mRNA-1 Check your input GFF3 file for errors! (from GFFDB) FATAL ERROR ERROR: Failed while processing the chunk divide!! ERROR: Chunk failed at level 17 !! FAILED CONTIG:PbPT3Sc00001 --Next Contig-- Regards HB On 14-02-11 12:37 PM, "Daniel Ence" wrote: >Hossein, > >Ok. So since this error came up on a local install, I'm going to need >some more information to understand what went wrong. Is it the same >contig that always causes this error? If it is, then is the the only >error or warning that MAKER encounters while running on this contig? Or, >if multiple contigs fail, then is it always the same error? > >If you can narrow it down to the smallest possible dataset that >consistently gives the same error, then we canb egin to understand what's >wrong. > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 11:20 AM >To: Daniel Ence >Subject: Re: [maker-devel] Falied to create new account > >Hi Daniel > >I running it through the local server at my work > > > > > > >M. Hossein Borhan, Ph.D. >Research Scientist/ Chercheur Scientifique >Saskatoon Research Centre/Centre de Recherches de Saskatoon >Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >107 Science Place, Saskatoon, SK.,S7N 0X2 >Telephone/T?l?phone: (306) 385-9441 >Facsimile/T?l?copieur: (306) 385-9482 >Hossein.borhan at agr.gc.ca > > > > > > > > >On 14-02-11 12:16 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>Did you encounter this error while you were running MAKER on your local >>machine or through the MAKER web annotation service? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Carson Holt [carsonhh at gmail.com] >>Sent: Tuesday, February 11, 2014 10:18 AM >>To: Daniel Ence >>Cc: Mark Yandell >>Subject: FW: [maker-devel] Falied to create new account >> >>Hey Daniel could you download his dataset, and see if you can replicate >>the error. Also check if this was an MWAS job or a local maker run (his >>dataset will already be there for MWAS, you just need the job ID). >> >>Thanks, >>Carson >> >>On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: >> >>>Hi Carson >>> >>> >>>I encountered this error while running maker >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00006 >>> >>> >>> >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>>> >>> >> >> > From marc.hoeppner at imbim.uu.se Wed Feb 12 01:34:12 2014 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Wed, 12 Feb 2014 09:34:12 +0100 Subject: [maker-devel] Annotations from protein alignments Message-ID: <52FB3204.60606@imbim.uu.se> Dear list, I have an annotation project with both protein data (it's a bird, so I've been using both vertebrates in general and chicken in specific), and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq data. The chicken augustus model seems to do a decent job in seeding gene loci, but it's not quite perfect. I want to use protein alignments to create a high-confidence set of exons and subsequently a set of gene loci to train e.g. snap), but when testing to set protein2genome=1 I never get any annotations. This is also true for the test data set that is delivered together with Maker (hsap_). Anything I should know about the use of proteins to generate annotations? I left all settings in the config file at their defaults (except protein2genome=1). I've tried this with both Maker 2.30 and 2.31. All the best, Marc -- ----------- Marc P. Hoeppner, PhD Group leader BILS Genome annotation platform Department of Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoepner at imbim.uu.se From carsonhh at gmail.com Wed Feb 12 08:42:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 12 Feb 2014 08:42:36 -0700 Subject: [maker-devel] Annotations from protein alignments In-Reply-To: <52FB3204.60606@imbim.uu.se> References: <52FB3204.60606@imbim.uu.se> Message-ID: I updated the 2.31 tar ball. Go ahead and download it again. protein2genome was turned off for eukaryotes and only working for prokaryotic genomes. ?Carson On 2/12/14, 1:34 AM, "Marc P. Hoeppner" wrote: >Dear list, > >I have an annotation project with both protein data (it's a bird, so >I've been using both vertebrates in general and chicken in specific), >and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq >data. The chicken augustus model seems to do a decent job in seeding >gene loci, but it's not quite perfect. I want to use protein alignments >to create a high-confidence set of exons and subsequently a set of gene >loci to train e.g. snap), but when testing to set protein2genome=1 I >never get any annotations. This is also true for the test data set that >is delivered together with Maker (hsap_). Anything I should know about >the use of proteins to generate annotations? I left all settings in the >config file at their defaults (except protein2genome=1). I've tried this >with both Maker 2.30 and 2.31. > >All the best, > >Marc > >-- >----------- >Marc P. Hoeppner, PhD >Group leader >BILS Genome annotation platform > >Department of Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoepner at imbim.uu.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 11:59:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 18:59:11 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > From dence at genetics.utah.edu Wed Feb 12 12:15:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 19:15:59 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , , Message-ID: Hi Hossein, One more question. How did you make the gff3 that you're passing through here? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu] Sent: Wednesday, February 12, 2014 11:59 AM To: Borhan, Hossein Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!! Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 13:42:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 20:42:03 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 12:55 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Hi Daniel I am using maker 2.10 I also checked the naming of the scaffold in the genome file and the gff file for the failed example. Naming is the same Thanks Hossein M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-12 1:30 PM, "Daniel Ence" wrote: >Hi Hossein, > >And which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 12:25 PM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Hi Daniel > >Gff file was generated by the 1st run of maker > > > >HB > > > > > > > >On 14-02-12 1:15 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>One more question. How did you make the gff3 that you're passing through >>here? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Daniel Ence [dence at genetics.utah.edu] >>Sent: Wednesday, February 12, 2014 11:59 AM >>To: Borhan, Hossein >>Cc: maker-devel at yandell-lab.org >>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk >>divide!! >> >>Hi Hossein, >> >>So, after looking at the gff3 and your control files, I had an idea. >>There's the part of the control file called "Re-annotation Using MAKER >>Derived GFF3", but you can also passthrough features from a gff3 using >>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. >> >>Sometimes we encounter problems with the MAKER passthrough. Could you try >>dividing the gff3 file into the different feature sources and passing it >>through the "est_gff" etc options and not with the MAKER passthrough? >>That will tell us if the problem is with the gff3 file or with how MAKER >>is processing it. >> >>Another also to check is to make sure that the contig names in the gff3 >>file match the contig names in the fasta file that you're annotating. >> >>Thanks, >>Daniel >> >> >> >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Wednesday, February 12, 2014 8:49 AM >>To: Daniel Ence >>Subject: Re: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >> >>I have generated the files that you requested. I choose Sc00009 from my >>genome which is 30 kb and was one of the scaffolds coming up with error. >>In addition to Ctl files and error output file I also attached a part of >>the gff file related to SC00009 that is indicated in the error message. >> >> >>Thanks for helping with this >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 4:59 PM, "Daniel Ence" wrote: >> >>>Hi Hossen, >>> >>>I think that what would be the most help right now is if you ran MAKER >>>on >>>only one of those contigs that are failing and send me the entire error >>>output along with the maker control files that you are using. It looks >>>like the error is coming from the gff3 files that you are using as >>>input. >>> >>>Thanks, >>>Daniel >>> >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 3:51 PM >>>To: Daniel Ence >>>Subject: ERROR: Failed while processing the chunk divide!! >>> >>>Dear Daniel >>> >>>I re-started maker and it is still running. But in error our file that >>>has >>>been generated so far it seems that smaller conitgs are affected. There >>>are contigs of 2-4 kb with this error but also I noticed a contig of >>>30kb >>>length having this error >>> >>>I was wondering if I need to change the setting in the maker_opt file >>> >>>#-----MAKER Behavior Options >>>max_dna_len=100000 #length for dividing up contigs into chunks >>>(increases/decreases memory usage) >>>min_contig=1 #skip genome contigs below this length (under 10kb are >>>often >>>useless) >>> >>> >>>If I understand correctly max_dna_len divide conitgs of over 100kb to >>>smaller chucks. However it is not clear to me that for the min_contig >>>option if the default contig length is 10kb or less, then why I have >>>error >>>message for 30kb long contigs. Should I change this to 0 >>> >>>Here is an example of the error message for one of the contigs >>> >>> >>>#--------- command -------------# >>>Widget::exonerate::est2genome: >>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>>-t >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11 >>>3 >>>6 >>>. >>>fasta >>>-Q dna -T dna --model est2genome >>>--minintron 20 --showcigar --percent 20 > >>>/raid01/projects/Plasmodiophora/brassica >>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra >>>s >>>s >>>i >>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP >>>T >>>3 >>>S >>>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>>#-------------------------------# >>>cleaning blastn... >>>cleaning tblastx... >>>cleaning blastx... >>>ERROR: Failed on >>>PbPT3Sc00001_S_0.8_1-mRNA-1 >>>Check your input GFF3 file for errors! >>>(from GFFDB) >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk >>>divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00001 >>> >>> >>> >>> >>>--Next Contig-- >>> >>> >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >>> >>>>Hossein, >>>> >>>>Ok. So since this error came up on a local install, I'm going to need >>>>some more information to understand what went wrong. Is it the same >>>>contig that always causes this error? If it is, then is the the only >>>>error or warning that MAKER encounters while running on this contig? >>>>Or, >>>>if multiple contigs fail, then is it always the same error? >>>> >>>>If you can narrow it down to the smallest possible dataset that >>>>consistently gives the same error, then we canb egin to understand >>>>what's >>>>wrong. >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>>Sent: Tuesday, February 11, 2014 11:20 AM >>>>To: Daniel Ence >>>>Subject: Re: [maker-devel] Falied to create new account >>>> >>>>Hi Daniel >>>> >>>>I running it through the local server at my work >>>> >>>> >>>> >>>> >>>> >>>> >>>>M. Hossein Borhan, Ph.D. >>>>Research Scientist/ Chercheur Scientifique >>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>>Telephone/T?l?phone: (306) 385-9441 >>>>Facsimile/T?l?copieur: (306) 385-9482 >>>>Hossein.borhan at agr.gc.ca >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>>> >>>>>Hi Hossein, >>>>> >>>>>Did you encounter this error while you were running MAKER on your >>>>>local >>>>>machine or through the MAKER web annotation service? >>>>> >>>>>Thanks, >>>>>Daniel >>>>> >>>>> >>>>>Daniel Ence >>>>>Graduate Student >>>>>Eccles Institute of Human Genetics >>>>>University of Utah >>>>>15 North 2030 East, Room 2100 >>>>>Salt Lake City, UT 84112-5330 >>>>>________________________________________ >>>>>From: Carson Holt [carsonhh at gmail.com] >>>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>>To: Daniel Ence >>>>>Cc: Mark Yandell >>>>>Subject: FW: [maker-devel] Falied to create new account >>>>> >>>>>Hey Daniel could you download his dataset, and see if you can >>>>>replicate >>>>>the error. Also check if this was an MWAS job or a local maker run >>>>>(his >>>>>dataset will already be there for MWAS, you just need the job ID). >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Hi Carson >>>>>> >>>>>> >>>>>>I encountered this error while running maker >>>>>> >>>>>>FATAL ERROR >>>>>>ERROR: Failed while processing the chunk divide!! >>>>>> >>>>>>ERROR: Chunk failed at level 17 >>>>>>!! >>>>>>FAILED CONTIG:PbPT3Sc00006 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From masa at bioinfo.hr Thu Feb 13 03:17:11 2014 From: masa at bioinfo.hr (Masa Roller) Date: Thu, 13 Feb 2014 11:17:11 +0100 Subject: [maker-devel] SNAP scores and AED scores Message-ID: <52FC9BA7.6060505@bioinfo.hr> Dear all, I ran snap2 based gene prediction through maker. In the resulting gff file, in the source "snap_masked" I can find the score in the score column of every snap prediction that did not get promoted to a maker gene. This would be the score of how well the prediction matches the HMM? It seems to me that those snap models that are given gene status no longer appear as snap_masked source but only as source "maker". Maker then removes the score column, instead giving AED and eAED scores (which are more about how the model corresponds to the evidence). When viewing the maker transcripts and SNAP predictions in a browser, they do not match (mostly, maker predictions are longer). I am interested in the score of individual gene predictions that underlined maker gene models. Where could I find that information? Many thanks! From carsonhh at gmail.com Thu Feb 13 13:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:11:22 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: <52FC9BA7.6060505@bioinfo.hr> References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: No. Snap genes do not disappear. All SNAP ab initio calls will always be kept as reference fetters marked snap_masked (for repeat masked genome) and snap (for unmasked genome). MAKER then runs SNAP another time where it feeds hints to SNAP based on EST and protein alignment evidence. These hint based models can then compete against the ab initio SNAP models to be promoted to genes if their AED scores are better. Fianl models can also get UTR added based on EST evidence. That is why you can get models from MAKER that do not match the original SNAP ab initio calls. So in summary, all SNAP ab initio models will be in snap_masked. The MAKER models will consist of hint based SNAP rerun plus SNAP ab intio models processed to add UTR. Thanks, Carson On 2/13/14, 3:17 AM, "Masa Roller" wrote: >Dear all, > >I ran snap2 based gene prediction through maker. > >In the resulting gff file, in the source "snap_masked" I can find the >score in the score column of every snap prediction that did not get >promoted to a maker gene. This would be the score of how well the >prediction matches the HMM? > >It seems to me that those snap models that are given gene status no >longer appear as snap_masked source but only as source "maker". Maker >then removes the score column, instead giving AED and eAED scores (which >are more about how the model corresponds to the evidence). When viewing >the maker transcripts and SNAP predictions in a browser, they do not >match (mostly, maker predictions are longer). > >I am interested in the score of individual gene predictions that >underlined maker gene models. Where could I find that information? > >Many thanks! > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Feb 13 13:23:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:23:07 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: On a side note. Because the MAKER models involve modifying either the ab initio SNAP model or manipulating the underlying scoring scheme using hints, the SNAP score on those is virtually meaningless. However Ian Korf has developed a tool that can take any gene structure and reverse generate a score (i.e. what would the score of this gene have been if SNAP would have called it that way in the first place). I believe the tool is called fathom and is part of the SNAP package. It is not well documented, so you might have to contact Ian Korf directly for that. You can use the maker2zff tool to generate the input to fathom. Thanks, Carson On 2/13/14, 1:11 PM, "Carson Holt" wrote: >No. Snap genes do not disappear. All SNAP ab initio calls will always be >kept as reference fetters marked snap_masked (for repeat masked genome) >and snap (for unmasked genome). MAKER then runs SNAP another time where >it feeds hints to SNAP based on EST and protein alignment evidence. These >hint based models can then compete against the ab initio SNAP models to be >promoted to genes if their AED scores are better. Fianl models can also >get UTR added based on EST evidence. That is why you can get models from >MAKER that do not match the original SNAP ab initio calls. > >So in summary, all SNAP ab initio models will be in snap_masked. The >MAKER models will consist of hint based SNAP rerun plus SNAP ab intio >models processed to add UTR. > >Thanks, >Carson > > > >On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >>Dear all, >> >>I ran snap2 based gene prediction through maker. >> >>In the resulting gff file, in the source "snap_masked" I can find the >>score in the score column of every snap prediction that did not get >>promoted to a maker gene. This would be the score of how well the >>prediction matches the HMM? >> >>It seems to me that those snap models that are given gene status no >>longer appear as snap_masked source but only as source "maker". Maker >>then removes the score column, instead giving AED and eAED scores (which >>are more about how the model corresponds to the evidence). When viewing >>the maker transcripts and SNAP predictions in a browser, they do not >>match (mostly, maker predictions are longer). >> >>I am interested in the score of individual gene predictions that >>underlined maker gene models. Where could I find that information? >> >>Many thanks! >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From barry.utah at gmail.com Thu Feb 13 13:27:17 2014 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 13 Feb 2014 13:27:17 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu> Hi Masa, Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore. This is a directory structure created by MAKER to store contig specific data. There is a datastore directory (and a corresponding index file) in the make output directory. The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid. This contains all of the output of each program that MAKER runs. B On Feb 13, 2014, at 1:11 PM, Carson Holt wrote: > No. Snap genes do not disappear. All SNAP ab initio calls will always be > kept as reference fetters marked snap_masked (for repeat masked genome) > and snap (for unmasked genome). MAKER then runs SNAP another time where > it feeds hints to SNAP based on EST and protein alignment evidence. These > hint based models can then compete against the ab initio SNAP models to be > promoted to genes if their AED scores are better. Fianl models can also > get UTR added based on EST evidence. That is why you can get models from > MAKER that do not match the original SNAP ab initio calls. > > So in summary, all SNAP ab initio models will be in snap_masked. The > MAKER models will consist of hint based SNAP rerun plus SNAP ab intio > models processed to add UTR. > > Thanks, > Carson > > > > On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >> Dear all, >> >> I ran snap2 based gene prediction through maker. >> >> In the resulting gff file, in the source "snap_masked" I can find the >> score in the score column of every snap prediction that did not get >> promoted to a maker gene. This would be the score of how well the >> prediction matches the HMM? >> >> It seems to me that those snap models that are given gene status no >> longer appear as snap_masked source but only as source "maker". Maker >> then removes the score column, instead giving AED and eAED scores (which >> are more about how the model corresponds to the evidence). When viewing >> the maker transcripts and SNAP predictions in a browser, they do not >> match (mostly, maker predictions are longer). >> >> I am interested in the score of individual gene predictions that >> underlined maker gene models. Where could I find that information? >> >> Many thanks! >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mptrsen at uni-bonn.de Thu Feb 13 20:00:24 2014 From: mptrsen at uni-bonn.de (Malte Petersen) Date: Fri, 14 Feb 2014 04:00:24 +0100 Subject: [maker-devel] BLAST options error / should Maker check for file format? Message-ID: <52FD86C8.6040007@uni-bonn.de> Dear MAKER devs, I was running Maker version 2.30p-beta on an insect genome, and it didn't produce any output. I got these error messages: Widget::formater: /path/to/makeblastdb -dbtype nucl -in /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 #-------------------------------# BLAST options error: File /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 is empty ERROR: /path/to/makeblastdb failed in Widget::formater --> rank=NA, hostname=Jeanne-GBR ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:scf7180005143343 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf7180005143343 I figured out that this error is due to a non-Fasta file format being fed to Maker as extrinsic evidence (I gave it a meta-info file). While I got the pipeline running now with the correct file, I think that it should be complaining (a lot earlier) if any of the input files are of the wrong format. More people might run into this problem and have no idea where to look for a solution. What do you think? Best, Malte From carsonhh at gmail.com Thu Feb 13 20:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 20:11:22 -0700 Subject: [maker-devel] BLAST options error / should Maker check for file format? In-Reply-To: <52FD86C8.6040007@uni-bonn.de> References: <52FD86C8.6040007@uni-bonn.de> Message-ID: Hi Malte, Actually there already is. I?m very surprised your file made it that far. Normally it fails right away. Example ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1 appears to be empty. Another test file ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The nucleotide sequence file '/Users/cholt/Developer/maker/trunk/data/test2' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'M' You seem to have found just the right formula of improper input to get past the filters on your run :-) Thanks, Carson On 2/13/14, 8:00 PM, "Malte Petersen" wrote: >Dear MAKER devs, > >I was running Maker version 2.30p-beta on an insect genome, and it >didn't produce any output. I got these error messages: > > >Widget::formater: >/path/to/makeblastdb -dbtype nucl -in >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >#-------------------------------# >BLAST options error: File >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >is empty >ERROR: /path/to/makeblastdb failed in Widget::formater >--> rank=NA, hostname=Jeanne-GBR >ERROR: Failed while doing blastn of ESTs >ERROR: Chunk failed at level:0, tier_type:3 >FAILED CONTIG:scf7180005143343 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf7180005143343 > > >I figured out that this error is due to a non-Fasta file format being >fed to Maker as extrinsic evidence (I gave it a meta-info file). While >I got the pipeline running now with the correct file, I think that it >should be complaining (a lot earlier) if any of the input files are of >the wrong format. More people might run into this problem and have no >idea where to look for a solution. > >What do you think? > >Best, >Malte > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 14 12:09:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Feb 2014 19:09:08 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features. The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". I've attached a script that should do this for you. Run it with this command "perl fix_gff3_script.pl > " And then run MAKER with the fixed gff3 file in place of the old gff3 file. Let me know if that works, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Thursday, February 13, 2014 3:27 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I downloaded maker 2.31 and ran the same scaffold. Again it gave error on the gff file. I then removed the word mRNA-1 from my gff file and ran it again. It seems to have worked this time. Attached are std error files for first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA (that apparently worked). Since the gff file is as evidence only I thought it should not matter to remove the mRNA-1 naming form the gff file. Cheers HB On 14-02-12 12:59 PM, "Daniel Ence" wrote: >Hi Hossein, > >So, after looking at the gff3 and your control files, I had an idea. >There's the part of the control file called "Re-annotation Using MAKER >Derived GFF3", but you can also passthrough features from a gff3 using >the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. > >Sometimes we encounter problems with the MAKER passthrough. Could you try >dividing the gff3 file into the different feature sources and passing it >through the "est_gff" etc options and not with the MAKER passthrough? >That will tell us if the problem is with the gff3 file or with how MAKER >is processing it. > >Another also to check is to make sure that the contig names in the gff3 >file match the contig names in the fasta file that you're annotating. > >Thanks, >Daniel > > > >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 8:49 AM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > > >I have generated the files that you requested. I choose Sc00009 from my >genome which is 30 kb and was one of the scaffolds coming up with error. >In addition to Ctl files and error output file I also attached a part of >the gff file related to SC00009 that is indicated in the error message. > > >Thanks for helping with this > > > >Regards > > >HB > > > > > > > > > > > > >On 14-02-11 4:59 PM, "Daniel Ence" wrote: > >>Hi Hossen, >> >>I think that what would be the most help right now is if you ran MAKER on >>only one of those contigs that are failing and send me the entire error >>output along with the maker control files that you are using. It looks >>like the error is coming from the gff3 files that you are using as input. >> >>Thanks, >>Daniel >> >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 3:51 PM >>To: Daniel Ence >>Subject: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >>I re-started maker and it is still running. But in error our file that >>has >>been generated so far it seems that smaller conitgs are affected. There >>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >>length having this error >> >>I was wondering if I need to change the setting in the maker_opt file >> >>#-----MAKER Behavior Options >>max_dna_len=100000 #length for dividing up contigs into chunks >>(increases/decreases memory usage) >>min_contig=1 #skip genome contigs below this length (under 10kb are often >>useless) >> >> >>If I understand correctly max_dna_len divide conitgs of over 100kb to >>smaller chucks. However it is not clear to me that for the min_contig >>option if the default contig length is 10kb or less, then why I have >>error >>message for 30kb long contigs. Should I change this to 0 >> >>Here is an example of the error message for one of the contigs >> >> >>#--------- command -------------# >>Widget::exonerate::est2genome: >>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>-t >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113 >>6 >>. >>fasta >>-Q dna -T dna --model est2genome >>--minintron 20 --showcigar --percent 20 > >>/raid01/projects/Plasmodiophora/brassica >>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras >>s >>i >>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT >>3 >>S >>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>#-------------------------------# >>cleaning blastn... >>cleaning tblastx... >>cleaning blastx... >>ERROR: Failed on >>PbPT3Sc00001_S_0.8_1-mRNA-1 >>Check your input GFF3 file for errors! >>(from GFFDB) >> >>FATAL ERROR >>ERROR: Failed while processing the chunk >>divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00001 >> >> >> >> >>--Next Contig-- >> >> >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >> >>>Hossein, >>> >>>Ok. So since this error came up on a local install, I'm going to need >>>some more information to understand what went wrong. Is it the same >>>contig that always causes this error? If it is, then is the the only >>>error or warning that MAKER encounters while running on this contig? Or, >>>if multiple contigs fail, then is it always the same error? >>> >>>If you can narrow it down to the smallest possible dataset that >>>consistently gives the same error, then we canb egin to understand >>>what's >>>wrong. >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 11:20 AM >>>To: Daniel Ence >>>Subject: Re: [maker-devel] Falied to create new account >>> >>>Hi Daniel >>> >>>I running it through the local server at my work >>> >>> >>> >>> >>> >>> >>>M. Hossein Borhan, Ph.D. >>>Research Scientist/ Chercheur Scientifique >>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>Telephone/T?l?phone: (306) 385-9441 >>>Facsimile/T?l?copieur: (306) 385-9482 >>>Hossein.borhan at agr.gc.ca >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>> >>>>Hi Hossein, >>>> >>>>Did you encounter this error while you were running MAKER on your local >>>>machine or through the MAKER web annotation service? >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Carson Holt [carsonhh at gmail.com] >>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>To: Daniel Ence >>>>Cc: Mark Yandell >>>>Subject: FW: [maker-devel] Falied to create new account >>>> >>>>Hey Daniel could you download his dataset, and see if you can replicate >>>>the error. Also check if this was an MWAS job or a local maker run >>>>(his >>>>dataset will already be there for MWAS, you just need the job ID). >>>> >>>>Thanks, >>>>Carson >>>> >>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Hi Carson >>>>> >>>>> >>>>>I encountered this error while running maker >>>>> >>>>>FATAL ERROR >>>>>ERROR: Failed while processing the chunk divide!! >>>>> >>>>>ERROR: Chunk failed at level 17 >>>>>!! >>>>>FAILED CONTIG:PbPT3Sc00006 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fix_gff3_script.pl Type: application/octet-stream Size: 349 bytes Desc: fix_gff3_script.pl URL: From claudio.valero at wur.nl Mon Feb 17 02:23:21 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Mon, 17 Feb 2014 09:23:21 +0000 Subject: [maker-devel] Maker not predicting many genes Message-ID: Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 4776 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SOBA.pdf Type: application/pdf Size: 210262 bytes Desc: SOBA.pdf URL: From carson.holt at genetics.utah.edu Mon Feb 17 12:22:13 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Feb 2014 19:22:13 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 17 12:26:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Feb 2014 12:26:05 -0700 Subject: [maker-devel] Maker not predicting many genes Message-ID: >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 01:20:04 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 08:20:04 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 19 08:34:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Feb 2014 08:34:33 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You provided a directory rather than a file to the -d option (?d' stands for datastore log). You must provide the location of the datastore index log file and not the datastore directory. Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log Thanks, Carson From: "Valero Jimenez, Claudio" Date: Wednesday, February 19, 2014 at 1:20 AM To: Carson Holt , Carson Holt , "'maker-devel at yandell-lab.org'" Subject: RE: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 19 09:04:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 19 Feb 2014 16:04:08 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 09:33:36 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 16:33:36 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi, Thanks, I had a mistake in the command line!!! Regards, Claudio From: Daniel Ence [mailto:dence at genetics.utah.edu] Sent: woensdag 19 februari 2014 17:04 To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: RE: [maker-devel] Maker not predicting many genes Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I'd set correct_est_fusion=1 as well. -Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Wed Feb 19 11:03:47 2014 From: barry.utah at gmail.com (Barry Moore) Date: Wed, 19 Feb 2014 11:03:47 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: > Hi Claudio, > > What was the command line you used for gff3_merge? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] > Sent: Wednesday, February 19, 2014 1:20 AM > To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > Hi Carson, > > Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: > > Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. > > Similar thing happens when I try fasta_merge: > > Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. > > I never had this problem before with these commands. > > > Regards, > > Claudio > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: maandag 17 februari 2014 20:26 > To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. > > ?Carson > > > From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM > To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes > > You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. > > If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). > > Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. > > Thanks, > Carson > > > > > > > From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM > To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes > > Dear list, > > I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. > > Regards, > > Claudio > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 19 11:06:52 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 19 Feb 2014 18:06:52 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> References: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Message-ID: You only need to swap a single character in the script. Just change the -e (exists) test to a -f (is file) test. Thanks, Carson From: Barry Moore > Date: Wednesday, February 19, 2014 at 11:03 AM To: Daniel Ence > Cc: "Valero Jimenez, Claudio" >, Carson Holt >, Carson Holt >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gtaylor at bcgsc.ca Fri Feb 21 11:48:42 2014 From: gtaylor at bcgsc.ca (Greg Taylor) Date: Fri, 21 Feb 2014 10:48:42 -0800 Subject: [maker-devel] Maker jobs hanging Message-ID: Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor From dence at genetics.utah.edu Fri Feb 21 11:54:17 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 18:54:17 +0000 Subject: [maker-devel] Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166 Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca] Sent: Friday, February 21, 2014 11:48 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker jobs hanging Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Feb 21 11:56:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Feb 2014 11:56:50 -0700 Subject: [maker-devel] Maker jobs hanging Message-ID: Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 15:04:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 22:04:34 +0000 Subject: [maker-devel] FW: Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Friday, February 21, 2014 11:56 AM To: Greg Taylor; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker jobs hanging Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 19:38:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 02:38:59 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> Message-ID: Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Fri Feb 21 21:27:10 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 04:27:10 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>, , Message-ID: Hi Joe, MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. Will you attach those file to your reply, so we can make sure that the settings are set up correctly? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 7:44 PM To: Daniel Ence Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Daniel, Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. Thanks, Joe ________________________________________ From: Daniel Ence Sent: Friday, February 21, 2014 7:38 PM To: Joseph Said Cc: maker-devel at yandell-lab.org Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Sat Feb 22 15:51:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 22:51:48 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>, Message-ID: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Sat Feb 22 16:21:51 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 23:21:51 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> , Message-ID: Hi Hua, will you upload the genome file to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170 I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model. Will you also upload the MAKER output and datastore from the MAKER run? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 4:00 PM To: Daniel Ence Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" > wrote: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sun Feb 23 09:57:09 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 16:57:09 +0000 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se> Dear list and maker developers, I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300). As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this? Best regards, Mikael Durling From carsonhh at gmail.com Sun Feb 23 13:00:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 23 Feb 2014 13:00:50 -0700 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: The image doesn?t show all evidence sources, but the short answer is that one of you evidence sources (est2genome, protein2genome, or blastx) bridges the two regions, and when provided the bridged hint one of the gene predictors thinks it makes sense to create a single model instead. my guess is that it?s blastx evidence. ?Carson On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" wrote: >Dear list and maker developers, > >I was browsing the results of a recent maker run, focusing on differences >between this run with the a recent maker (svn r1067) and a previous run >with svn revision 1022 (I recall). One of the differences I found was a >gene lost in the new prediction set, but replaced by an extended version >of a previous neighbor (see >http://figshare.com/articles/Maker_prediction_comparison/942300). As you >can see, there is no support for the join in the evidence. Do you have >any clue to what might cause this? > >Best regards, >Mikael Durling > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Sun Feb 23 14:14:00 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 21:14:00 +0000 Subject: [maker-devel] Maker predicting fusion genes? In-Reply-To: References: Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se> Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted). Mikael 23 feb 2014 kl. 21:00 skrev Carson Holt : > The image doesn?t show all evidence sources, but the short answer is that > one of you evidence sources (est2genome, protein2genome, or blastx) > bridges the two regions, and when provided the bridged hint one of the > gene predictors thinks it makes sense to create a single model instead. > my guess is that it?s blastx evidence. > > ?Carson > > > On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Dear list and maker developers, >> >> I was browsing the results of a recent maker run, focusing on differences >> between this run with the a recent maker (svn r1067) and a previous run >> with svn revision 1022 (I recall). One of the differences I found was a >> gene lost in the new prediction set, but replaced by an extended version >> of a previous neighbor (see >> http://figshare.com/articles/Maker_prediction_comparison/942300). As you >> can see, there is no support for the join in the evidence. Do you have >> any clue to what might cause this? >> >> Best regards, >> Mikael Durling >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: region.gff3 Type: application/octet-stream Size: 19612 bytes Desc: region.gff3 URL: From hedgyx at yahoo.com Mon Feb 24 00:02:41 2014 From: hedgyx at yahoo.com (Megan) Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Maker folks, I am re-annotating a single contig and I am having a few problems. First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added). The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first. So it appears to be a systematic error across the entire file. The Maker error is "Check your input GFF3 file for errors! (from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff. Attached is a gff with the first 2 genes. Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV]. It suggests "set -fix_nucleotides on the command line to fix this automatically". Is the -fix_nucleotides a Maker flag? What exactly does it do? Does it remove the entire sequence or replace ambiguous bases with a randomly selected one? Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely. Also, just curious, has Maker never supported these characters but just never complained? I used this EST data set with Maker 2.09. I did note poor EST coverage, but thought it was an issue with the EST data itself. I appreciate any suggestions. Thanks, Megan -------------- next part -------------- A non-text attachment was scrubbed... Name: part_passthru.gff Type: application/octet-stream Size: 4363 bytes Desc: not available URL: From zh9118 at gmail.com Sat Feb 22 16:00:28 2014 From: zh9118 at gmail.com (Hua Zhong) Date: Sat, 22 Feb 2014 16:00:28 -0700 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> Message-ID: The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" wrote: > Hi, > > Will you send me the long file that you were trying to blast against? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* Hua Zhong [zh9118 at gmail.com] > *Sent:* Saturday, February 22, 2014 10:46 AM > *To:* Daniel Ence > *Cc:* Joe Song; Joseph Said > *Subject:* Re: I am a PhD candidate at NMSU and have a question about > maker2 > > hi all, > Attached are the three configuration files and two input files, which are > used to predict something between the genome and protein. For a simple > test, we used one short sequence about 60bp and its translated protein > sequence as inputs. But got nothing returned. What's more, we did test long > genome sequence as one input as well, but still got nothing. I am not sure > what's the reason cause this result. > Thanks a lot for help. > > Hua > > > > > On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said wrote: > >> Hi Daniel, >> >> I do not have the exact files with me right now, but my coauthors on the >> paper I am working on have been copied on this email. Hua can send you >> those files. Thank you for being very helpful especially on a Friday night. >> >> Thanks, >> Joe >> >> Sent from my iPad >> >> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" >> wrote: >> > >> > Hi Joe, >> > >> > MAKER runs blast from your local system (or your server where MAKER is >> installed), and it blasts evidence that the user supplies in the "est" and >> "protein" settings. The est and protein settings are set in the >> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file >> and the specific blast settings are in the "maker_bopts.ctl" file. >> > >> > Will you attach those file to your reply, so we can make sure that the >> settings are set up correctly? >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 7:44 PM >> > To: Daniel Ence >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Daniel, >> > >> > Thank you for getting back to me so quickly. I am using the cotton >> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the >> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I >> believe maker2 just calls BLAST from NCBI's page. So when I search the >> cotton genome it returns zero hits. But then I used a known cotton gene as >> a test and ran a search and also returned zero hits. I am not sure what the >> problem is but it seems like the protocol that should be returning the >> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. >> I can a BLAST standalone and came up with hits for both my gene of interest >> and the control test gene and came up with results. >> > >> > Thanks, >> > Joe >> > ________________________________________ >> > From: Daniel Ence >> > Sent: Friday, February 21, 2014 7:38 PM >> > To: Joseph Said >> > Cc: maker-devel at yandell-lab.org >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Joe, >> > >> > Will you upload your control files and data at this URL? >> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 >> > >> > Also, what version of MAKER and blast are you using? And which file are >> you using for the known arabidopsis gene? >> > >> > I've copied this email to the maker-development list, which is a really >> good resource for trouble-shooting MAKER issues. >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Mark Yandell >> > Sent: Friday, February 21, 2014 7:32 PM >> > To: Daniel Ence >> > Subject: FW: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Mark Yandell >> > Professor of Human Genetics >> > H.A. & Edna Benning Presidential Endowed Chair >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ph:801-587-7707 >> > >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 5:18 PM >> > To: Mark Yandell >> > Subject: I am a PhD candidate at NMSU and have a question about maker2 >> > >> > Dear Dr. Yandell, >> > >> > I am a molecular biologist at NMSU. I am trying to use maker2 with the >> cotton genome, and search an Arabidopsis gene against it. I think there is >> a problem with the blast component because zero results are returned. I >> tried troubleshooting by searching a known gene and still returned zero >> results. Is this a common problem maybe with the pipeline? I would >> appreciate any ideas you might have to help me. >> > >> > Thank you, >> > Joe >> > >> > Sent from my iPad >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 24 11:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:18:18 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Feb 24 11:31:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 24 Feb 2014 18:31:47 +0000 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>, Message-ID: Hi Megan, One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. You can fix it with this command-line perl: cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3 It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, February 24, 2014 11:18 AM To: Megan; maker-devel at yandell-lab.org Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 11:34:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:34:28 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: Actually that is not true. CDS IDs can be the same or different. MAKER doesn?t care either way. Both are valid in GFF3. Having the same ID just allows then to be put together by some GMOD viewers without having to go through a container feature. ?Carson On 2/24/14, 11:31 AM, "Daniel Ence" wrote: >Hi Megan, > >One problem with the GFF3 that you attached is that the ID's for the CDS >features are being made wrong. All of the CDS features for a given mRNA >or transcript should have the same ID. The CDS features in your GFF3 have >IDs that use the exon name. > >You can fix it with this command-line perl: >cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; >/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print >"$_\n"}else{print $_}' > fixed.gff3 > >It just fixes the ID attributes in all of the CDS features. Try it on the >test gff3 you sent and let me know if it works. I can't test it myself >without the fasta file that you are annotating. > >Thanks, >Daniel > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Carson Holt [carsonhh at gmail.com] >Sent: Monday, February 24, 2014 11:18 AM >To: Megan; maker-devel at yandell-lab.org >Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > >The -fix_nucleotides flag is added to the command line (I.e. maker >-fix_nucleotides flag). It is there so you are aware that there is an >issue with your fasta file, that will cause things downstream to fail. >MAKER can fix the errors for you, but first it gives a warning designed to >make you look at the file and validate it. Why would you want to do this? > For example, what if you provided protein sequence to the EST option >accidentally, you wouldn?t want MAKER to just proceed. You want a warning >so you can check first. If your file is in fact EST data, then set the >flag and those characters will be changed to N?s in the fixed fasta >sequence, otherwise those characters will cause errors in downstream tools >like exonerate, and even some downstream GMOD tools, so they can?t be >allowed to remain as is. > >For the GFF3 file, there is almost definitely a logic issue in the file >(mod encode validator won?t check for those). This can be from prior >manipulation of the GFF3 file. For example, IDs for a gene that are the >same across two contigs (technically valid but a logic error). The GFF3 >error message will normally give the ID of the feature causing the issue. > >I could also take a look for you. You can upload the GFF3 file here ?> >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >Click on 'new guest account' then e-mail me back you guest ID, so I know >which files to review. > >Thanks, >Carson > > > >On 2/24/14, 12:02 AM, "Megan" wrote: > >>Maker folks, >>I am re-annotating a single contig and I am having a few problems. >> >>First, I am having trouble passing through a Maker derived gff (from >>Maker 2.09, with some modifications to gene names and functional >>information added). The gff file passes the modencode validator but >>Maker always fails on the first gene in the file, regardless of which >>gene comes first. So it appears to be a systematic error across the >>entire file. The Maker error is "Check your input GFF3 file for errors! >>(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >>with model_pass=1 and pred_gff. Attached is a gff with the first 2 >>genes. >> >>Second, when I updated to Maker 2.31, Maker now complains that my EST >>fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >>suggests "set -fix_nucleotides on the command line to fix this >>automatically". Is the -fix_nucleotides a Maker flag? What exactly does >>it do? Does it remove the entire sequence or replace ambiguous bases >>with a randomly selected one? Half of my 20k ESTs contain these >>characters, so I don't want to throw them out entirely. >> >>Also, just curious, has Maker never supported these characters but just >>never complained? I used this EST data set with Maker 2.09. I did note >>poor EST coverage, but thought it was an issue with the EST data itself. >> >>I appreciate any suggestions. >>Thanks, >>Megan_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 13:59:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 13:59:12 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: I found the issue. You have non-ascii characters at the end of almost every line. Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works. Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions. I have looked at the gff file, but I do not >see any obvious errors. I have uploaded the files to your website. The >reference fasta is there, the full gff, and a single gene gff that also >causes an error. If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff. The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information. I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag). It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it. Why would > you want to do this? > For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed. You want a warning > so you can check first. If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those). This > can be from prior > manipulation of the GFF3 file. For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error). The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you. You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added). The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first. So it appears to be a systematic > error across the > >entire file. The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)". I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff. Attached is a gff > with the first 2 > >genes. > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV]. It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically". Is the -fix_nucleotides a Maker > flag? What exactly does > >it do? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained? I used this EST data set with > Maker 2.09. I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: purify Type: application/octet-stream Size: 1965 bytes Desc: not available URL: From carsonhh at gmail.com Mon Feb 24 14:03:00 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 14:03:00 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: One more thing. You must give the file to pred_gff or model_gff. It is no longer strictly a MAKER file, as many of the source columns read ?.? meaning it has been edited by Apollo or another editor. So it will not be guaranteed to be recognized by genome_gff, because many of the source tags have changed. Thanks, Carson On 2/24/14, 1:59 PM, "Carson Holt" wrote: >I found the issue. You have non-ascii characters at the end of almost >every line. Because they are happening within the Parent= tag, they then >become part of the Parent ID when the file is read. > >So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent >ID. > >?\cM? is a meta-return. > >I ran the attached script to remove these characters (perl purify >), and then it works. Make sure to remove the >.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file >to force the GFF3 database to be rebuilt after fixing the file when you >rerun MAKER. > >Thanks, >Carson > > > > >On 2/24/14, 1:32 PM, "Megan" wrote: > >>Hi Carson and Daniel, >> >>Thanks for your suggestions. I have looked at the gff file, but I do not >>see any obvious errors. I have uploaded the files to your website. The >>reference fasta is there, the full gff, and a single gene gff that also >>causes an error. If I remove that gene from the full gff, then the error >>is on the next gene in the file, so it appears to be a systematic problem >>throughout the gff. The gff was generated by Maker, but I may have >>messed it up when I modified it to rename genes and add functional >>information. I checked with cat -te, but don't see any obvious >>formatting errors. >> >>Thanks! >>Megan >> >> >>-------------------------------------------- >>On Mon, 2/24/14, Carson Holt wrote: >> >> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >>nucleotides >> To: "Megan" , maker-devel at yandell-lab.org >> Date: Monday, February 24, 2014, 10:18 AM >> >> The -fix_nucleotides flag is added to >> the command line (I.e. maker >> -fix_nucleotides flag). It is there so you are aware >> that there is an >> issue with your fasta file, that will cause things >> downstream to fail. >> MAKER can fix the errors for you, but first it gives a >> warning designed to >> make you look at the file and validate it. Why would >> you want to do this? >> For example, what if you provided protein sequence to the >> EST option >> accidentally, you wouldn?t want MAKER to just >> proceed. You want a warning >> so you can check first. If your file is in fact EST >> data, then set the >> flag and those characters will be changed to N?s in the >> fixed fasta >> sequence, otherwise those characters will cause errors in >> downstream tools >> like exonerate, and even some downstream GMOD tools, so they >> can?t be >> allowed to remain as is. >> >> For the GFF3 file, there is almost definitely a logic issue >> in the file >> (mod encode validator won?t check for those). This >> can be from prior >> manipulation of the GFF3 file. For example, IDs for a >> gene that are the >> same across two contigs (technically valid but a logic >> error). The GFF3 >> error message will normally give the ID of the feature >> causing the issue. >> >> I could also take a look for you. You can upload the >> GFF3 file here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> Click on 'new guest account' then e-mail me back you guest >> ID, so I know >> which files to review. >> >> Thanks, >> Carson >> >> >> >> On 2/24/14, 12:02 AM, "Megan" >> wrote: >> >> >Maker folks, >> >I am re-annotating a single contig and I am having a few >> problems. >> > >> >First, I am having trouble passing through a Maker >> derived gff (from >> >Maker 2.09, with some modifications to gene names and >> functional >> >information added). The gff file passes the >> modencode validator but >> >Maker always fails on the first gene in the file, >> regardless of which >> >gene comes first. So it appears to be a systematic >> error across the >> >entire file. The Maker error is "Check your input >> GFF3 file for errors! >> >(from GFFDB)". I have tried Maker 2.10 >> and 2.31, using both genome_gff >> >with model_pass=1 and pred_gff. Attached is a gff >> with the first 2 >> >genes. >> > >> >Second, when I updated to Maker 2.31, Maker now >> complains that my EST >> >fasta file has nucleotides that are not supported >> [RYKMSWBDHV]. It >> >suggests "set -fix_nucleotides on the command line to >> fix this >> >automatically". Is the -fix_nucleotides a Maker >> flag? What exactly does >> >it do? Does it remove the entire sequence or >> replace ambiguous bases >> >with a randomly selected one? Half of my 20k ESTs >> contain these >> >characters, so I don't want to throw them out entirely. >> > >> >Also, just curious, has Maker never supported these >> characters but just >> >never complained? I used this EST data set with >> Maker 2.09. I did note >> >poor EST coverage, but thought it was an issue with the >> EST data itself. >> > >> >I appreciate any suggestions. >> >Thanks, >> >Megan_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > From rbharris at uw.edu Tue Feb 25 14:49:57 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Tue, 25 Feb 2014 13:49:57 -0800 Subject: [maker-devel] error in snap training Message-ID: Hey - I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 15:12:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 15:12:14 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Make sure you are using 2.31, and then try the maker2zff filters individually. If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP. Use cegma2zff for the conversion. --Carson Sent from my iPhone > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > Hey - > > I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. > > Thanks! > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Feb 25 17:06:03 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 25 Feb 2014 16:06:03 -0800 Subject: [maker-devel] Mapping gene names Message-ID: Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the *map_forward* option, which applies to the *model_gff* parameter. Is there a similar option for *est* and *protein*? *maker_opts.ctl* est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From hedgyx at yahoo.com Tue Feb 25 17:26:11 2014 From: hedgyx at yahoo.com (Megan) Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com> Carson, Everything ran through smoothly after removing the ^Ms. Thanks for the help. Megan -------------------------------------------- On Mon, 2/24/14, Carson Holt wrote: Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides To: "Megan" , "Daniel Ence" Cc: "maker-devel at yandell-lab.org" Date: Monday, February 24, 2014, 12:59 PM I found the issue.? You have non-ascii characters at the end of almost every line.? Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works.? Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions.? I have looked at the gff file, but I do not >see any obvious errors.? I have uploaded the files to your website.? The >reference fasta is there, the full gff, and a single gene gff that also >causes an error.? If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff.? The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information.? I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag).? It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it.? Why would > you want to do this? >? For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed.? You want a warning > so you can check first.? If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those).? This > can be from prior > manipulation of the GFF3 file.? For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error).? The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you.? You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added).? The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first.? So it appears to be a systematic > error across the > >entire file.? The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)".???I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff.? Attached is a gff > with the first 2 > >genes.? > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV].? It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically".? Is the -fix_nucleotides a Maker > flag?? What exactly does > >it do?? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one?? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained?? I used this EST data set with > Maker 2.09.? I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Feb 25 17:58:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 17:58:08 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 18:04:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 18:04:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: One more note. When using this option, the score column of mRNA features will represent how completely this gene matches the source EST/protein (fraction coverage multiplied by % identity). So a value of 100 means there is perfect match. This way if the same transcript maps to multiple locations, then you can identify which locations is the closest match (also works for identifying likly orthologs vs. paralogs). ?Carson From: Carson Holt Date: Tuesday, February 25, 2014 at 5:58 PM To: Shaun Jackman , Subject: Re: [maker-devel] Mapping gene names There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 18:36:21 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 17:36:21 -0800 Subject: [maker-devel] invalid gff3 format issues Message-ID: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 25 19:02:04 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 26 Feb 2014 02:02:04 +0000 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Walter, Will you upload the full GFF3 and the control files that you used to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 Also, what version of MAKER are you running this with? Thanks, Daniel On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 19:11:12 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 18:11:12 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Daniel, those have been uploaded and I'm using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to > this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > > Hi all, > > I am trying to update maker annotations with PASA and encountered errors > stemming from file format issues in the gff3 file. > > I put a few lines from the gff3 to highlight the issue below. Basically, > the problem is that there are non-unique IDs for a number of the > annotations. > > Is there anything that can be done to right this problem? > > Thanks, > > Walter > > Lines from GFF3 file, repeated IDs are highlighted: > > > chr1 maker gene 9377440 9432028 . - . > ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 > chr1 maker mRNA 9377440 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 > chr1 maker exon 9431899 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > chr1 maker exon 9431698 9431808 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > > chr1 maker gene 8894975 9021577 . + . > ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 > chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 > chr1 maker exon 8894975 8895153 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > chr1 maker exon 8942215 8942531 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 21:10:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 21:10:27 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you try version 2.31 (the current version)? I believe this is happening because you are passing in MAKER genes as pred_gff the transcripts thus ended up with the same Names and IDs as the genes being generated by the MAKER run via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in 2.31 (IDs and names are generated slightly differently in 2.30+). Thanks, Carson From: Walter Eckalbar Date: Tuesday, February 25, 2014 at 7:11 PM To: Daniel Ence Cc: "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Daniel, those have been uploaded and I?m using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, the >> problem is that there are non-unique IDs for a number of the annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4. >> 16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1 >> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82 >> |1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak >> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0 >> .88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Feb 26 01:26:35 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 26 Feb 2014 08:26:35 +0000 Subject: [maker-devel] Functional annotation options Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? 2) maker_functional_gff This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 3) maker_functional This just throws an error about a missing Job ID, so no clue what this would be used for. I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. With kind regards, Marc Hoeppner Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From mikael.durling at slu.se Wed Feb 26 02:43:43 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:43:43 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> 26 feb 2014 kl. 09:26 skrev Marc H?ppner : > Dear List, > > I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: > > 1) iprscan > It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. > 2) maker_functional_gff > This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. It works fine with ncbiblast+ and the blastp command with -outfmt 6. cheers, Mikael Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations. > > 3) maker_functional > This just throws an error about a missing Job ID, so no clue what this would be used for. > > I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. > > With kind regards, > > Marc Hoeppner > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Wed Feb 26 02:55:56 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:55:56 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se> 26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling >: 26 feb 2014 kl. 09:26 skrev Marc H?ppner >: Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however. Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 05:30:44 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 12:30:44 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 06:22:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 06:22:34 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: > > Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >> >> maker_opts.ctl >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> Thanks, >> Shaun >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 06:37:29 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 13:37:29 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nextgen.usfs at gmail.com Wed Feb 26 09:21:33 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:21:33 -0600 Subject: [maker-devel] change program locations in maker_exe Message-ID: Hello, I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information. Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file? Thank you. - Jon From carsonhh at gmail.com Wed Feb 26 08:38:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 08:38:47 -0700 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: maker_functional is a script that gets called by another script, not meant to be called directly by the user. So ignore that. Just run iprscan directly it already works pretty well. The mpi_iprscan and iprscan_wrap scripts, just give some logging functionality by wrapping the iprscan call. In most cases there is not advantage over just running iprscan directly. ?Carson On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" wrote: > >26 feb 2014 kl. 09:26 skrev Marc H?ppner : > >> Dear List, >> >> I have finished a gene build now, and I would like to go over to >>functional annotation. I understand that maker includes a few script to >>facilitate such analyses. However, I have a few questions about this: >> >> 1) iprscan >> It seems maker includes a MPI wrapper for InterProscan, but requests >>?iprscan? to be in $PATH. The latest versions of Interproscan I have >>worked with are java applications and eventho I put their location in >>$PATH, mpi_iprscan seems to want something else? But what? > >I don?t believe it works with interproscan5. What I usually do is to >split the maker protein file into chunks, and then run these chunks as >separate jobs on our cluster, then finally merge the results. The TSV >file form iprscan5 can be input into the maker tool ipr_update_gff. I >have not tried the iprscan2gff3, as I haven?t figured how to get an >iprscan4 raw file from iprscan5. > > >> 2) maker_functional_gff >> This script seems to be very useful, but the description suggests that >>it requires WuBlast tabular output ?2', which I think looks quite >>different from the ncbi blast tabular output. Since Wublast is not >>really available anymore (except this very old, frozen binary bundle), I >>was wondering how to address this issue. > >It works fine with ncbiblast+ and the blastp command with -outfmt 6. > >cheers, >Mikael > >Ps. Your welcome to visit me at SLU if you would like to discuss >experiences of genome annotations. > > >> >> 3) maker_functional >> This just throws an error about a missing Job ID, so no clue what this >>would be used for. >> >> I guess what I am after is some suggestion as to how use the scripts >>included with Maker to achieve a reasonable functional annotation. >> >> With kind regards, >> >> Marc Hoeppner >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 09:09:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 09:09:14 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: > >> Can this use of maker_coor be used only to hint about the placement of the >> ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, can >> this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >>> There is a way. It?s not a standard option and it?s undocumented, but if >>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that. >>> The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add tags >>> to your fasta headers, those can be used to guide the mapping and naming. >>> For example, gene_id= will ensure different isoforms that share >>> a common gene_id get clustered into the same gene, and >>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence >>> to only be mapped against chr1 within the range of 1-10000 bp and just >>> using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using blast >>> alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, using >>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate >>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is >>> it possible to map the names of the genes from the related species to my >>> annotation? I see the map_forward option, which applies to the model_gff >>> parameter. Is there a similar option for est and protein? >>> >>> maker_opts.ctl >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 26 09:38:37 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 26 Feb 2014 16:38:37 +0000 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: MAKER first looks inside of .../maker/exe/ for any executables. Then it uses the systems ?which? command to identify executables in your PATH environmental variable. If MAKER is not finding the one you want, then you can either put the program in the .../maker/exe/ folder (I.e. create .../maker/exe/bin/ and then put soft links to the executables you want to be used first), or you can rearrange the order of paraameters in your PATH environmental variable so that ?which ? returns the location you want. If MAKER is always leaving the locations to those programs empty, it is because you need to add them to your PATH environmental variable. Thanks, Carson On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: >Hello, >I was wondering if there is a way to make permanent changes to the >maker_exe.ctl file, as it seems on the install that maker didn?t find the >gene mark or pro build locations correctly, which means that I have to >manually edit the maker_exe.ctl file every time and add that information. > Where can I modify this permanently so that the maker -CTL command >creates the appropriate maker_exe file? Thank you. > >- Jon > > From nextgen.usfs at gmail.com Wed Feb 26 09:58:11 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:58:11 -0600 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com> Hi Carson, Thank you - that did it, I didn?t have them in the PATH. All working now. Cheers, Jon On Feb 26, 2014, at 10:38 AM, Carson Holt wrote: > MAKER first looks inside of .../maker/exe/ for any executables. Then it > uses the systems ?which? command to identify executables in your PATH > environmental variable. If MAKER is not finding the one you want, then > you can either put the program in the .../maker/exe/ folder (I.e. create > .../maker/exe/bin/ and then put soft links to the executables you want to > be used first), or you can rearrange the order of paraameters in your PATH > environmental variable so that ?which ? returns the location > you want. If MAKER is always leaving the locations to those programs > empty, it is because you need to add them to your PATH environmental > variable. > > Thanks, > Carson > > On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: > >> Hello, >> I was wondering if there is a way to make permanent changes to the >> maker_exe.ctl file, as it seems on the install that maker didn?t find the >> gene mark or pro build locations correctly, which means that I have to >> manually edit the maker_exe.ctl file every time and add that information. >> Where can I modify this permanently so that the maker -CTL command >> creates the appropriate maker_exe file? Thank you. >> >> - Jon >> >> > From weckalba at asu.edu Wed Feb 26 13:05:05 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Wed, 26 Feb 2014 12:05:05 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA's gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is > happening because you are passing in MAKER genes as pred_gff the > transcripts thus ended up with the same Names and IDs as the genes being > generated by the MAKER run via SNAP etc. This shouldn't happen with > model_gff, and shouldn't happen in 2.31 (IDs and names are generated > slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I'm using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: > >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, >> the problem is that there are non-unique IDs for a number of the >> annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 14:12:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 14:12:23 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you put the file in this GFF3 validator to see if anything comes up? ?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Maybe it?s just PASA. But I?d like to know there?s no issue being caused by something else. Thanks, Carson From: Walter Eckalbar Date: Wednesday, February 26, 2014 at 1:05 PM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA?s gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is happening > because you are passing in MAKER genes as pred_gff the transcripts thus ended > up with the same Names and IDs as the genes being generated by the MAKER run > via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in > 2.31 (IDs and names are generated slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I?m using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >>> Hi all, >>> >>> I am trying to update maker annotations with PASA and encountered errors >>> stemming from file format issues in the gff3 file. >>> >>> I put a few lines from the gff3 to highlight the issue below. Basically, >>> the problem is that there are non-unique IDs for a number of the >>> annotations. >>> >>> Is there anything that can be done to right this problem? >>> >>> Thanks, >>> >>> Walter >>> >>> Lines from GFF3 file, repeated IDs are highlighted: >>> >>> >>> chr1 maker gene 9377440 9432028 . - . >>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4 >>> .16 >>> chr1 maker mRNA 9377440 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4. >>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0. >>> 82|1|1|1|28|1680|1234 >>> chr1 maker exon 9431899 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> chr1 maker exon 9431698 9431808 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> >>> chr1 maker gene 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >>> chr1 maker mRNA 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma >>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84 >>> |0.88|27|503|2007 >>> chr1 maker exon 8894975 8895153 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> chr1 maker exon 8942215 8942531 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 15:04:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 22:04:37 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt >: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 15:50:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 15:50:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : > It will still work without est_forward. It just works a little differently. > Keep in mind this was a hidden feature I used to find stubborn or hard to find > missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and then > polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for > a match using all of chr1 as the input to exonerate even when BLAST finds > nothing (this is a very very slow search, but can help pick up one or two > stubborn genes that don?t remap well). To allow this, MAKER gives exonerate > looser matching parameters (i.e. allows for single base pair introns perhaps > caused by assembly errors). The logic here is that given the fact that I > already told MAKER that with some degree of confidence I expect sequence A to > map to to location X, it will try its hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at line > 1563, but only after a BLAST alignment has already seeded it to the region > (that BLAST result has the information in its description parameter). MAKER > will then ignore seeds completely outside of maker_coor. In addition any BLAST > seeds that overlap maker_coor will get the search space for alignment > polishing adjusted to match maker_coor exactly. Also match parameters for > exonerate will not be relaxed as they were with est_forward. > > As you can see the behavior, is slightly different (because it?s an accidental > feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading the > code, it seems that I need to supply maker_coor but not gene_id, as well as > the configuration option est_forward for this to work. Any occurrences of > maker_coor in GI.pm seems to be conditioned on set_forward=1 right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >> wrote: >> >>> Can this use of maker_coor be used only to hint about the placement of the >>> ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>>> There is a way. It?s not a standard option and it?s undocumented, but if >>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add tags >>>> to your fasta headers, those can be used to guide the mapping and naming. >>>> For example, gene_id= will ensure different isoforms that share >>>> a common gene_id get clustered into the same gene, and >>>> maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using blast >>>> alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, using >>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the map_forward option, which applies to the >>>> model_gff parameter. Is there a similar option for est and protein? >>>> >>>> maker_opts.ctl >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 16:45:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 16:45:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson Sent from my iPhone > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. > > THanks, > Mikael > >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >> >> As you can see the behavior, is slightly different (because it?s an accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >> >> Mikael >> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>> >>>>> maker_opts.ctl >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> Thanks, >>>>> Shaun >>>>> >>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 09:46:44 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 11:46:44 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: maker_error_openfabrics.txt URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Feb 27 11:09:21 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 27 Feb 2014 18:09:21 +0000 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib Example : mpiexec -mca btl ^openib -n 20 maker Thanks, Carson From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM To: > Subject: Problem with OpenFabrics and infiniband Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 11:55:34 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 13:55:34 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com> Hi Carson, Thanks that fixed the issue. Cheers Ian On Feb 27, 2014, at 1:09 PM, Carson Holt wrote: > It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). > > For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib > > Example : >> mpiexec -mca btl ^openib -n 20 maker > > > Thanks, > Carson > > > From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM > To: > Subject: Problem with OpenFabrics and infiniband > > Hello, > > I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. > > IT suggestions > > If you look at the top of the error log for the problematic job, it clearly > warns of an issue with doing 'fork's within openmpi/openfabrics framework. > > In particular, the use of the fork system call is only partially supported > in the OpenFabrics software (this is the drivers, etc for the infiniband > connections). See e.g. > http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork > for more information. In particular the paragraphs starting with the > sentence with the red highlighted "it does not mean that your fork()-calling > application is safe". (The kernel, openMPI version, and OFED version are > sufficiently recent to mean that there is _some_ fork support). > > The fact that the job runs over gigE but not IB, in conjunction with the > warning from openmpi, strongly suggests that this is the issue that you are > encountering. I suspect that maker touches registered memory before the fork, > which would result in a segfault (matching what was observed). > > You can try adding the arguments > --mca mpi_warn_on_fork 0 > to the mpirun command, just in case the crash was somehow caused by openmpi's > warning, but I would not hold out much hope for that. > > ###UPDATE### This does not fix the problem. > > > Basically, it looks like maker uses some system calls like fork in a manner > which is incompatible with the current OpenFabrics software, and thus will > not work with infiniband. This situation is likely to remain until either > maker changes to be compatible with OFED, or OFED's support for the fork > system call is broadened. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 16:17:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 15:17:22 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 17:27:30 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 16:27:30 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: > Is there a corresponding protein_forward=1 option to map forward protein > names from protein2genome? > > Cheers, > Shaun > > On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) > wrote: > > Sorry I meant to say prefilter on the score in the mRNA column before > passing the gff3 to model_gff. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and > est2genome/protein2genome set to 1. Then take those results, pass them in > as model_gff and use the map_forward option to then filter the results > based on mRNA score and that would copy names onto new gene under the > standard MAKER pipeline. Eventually it?s really supposed to go into a > separate tool that will map genes onto new assemblies (but under the hood > the tool will just be calling MAKER with certain parameters restricted). I > do this because if people commonly use it mixed with things like SNAP I can > start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where > you have firm a priori knowledge of the placement of ESTs. However, while > trying it I note that est_forward implies that the est2genome predictor is > turned on, implicitly. Is this necessary for this to work? I?m after the > behavior you describe below where exonerate is made to try really hard > within a limited region to align an est, but I would not like maker to > produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that > is worthy to be promoted into a documented feature. > > THanks, > Mikael > > 26 feb 2014 kl. 17:09 skrev Carson Holt : > > It will still work without est_forward. It just works a little > differently. Keep in mind this was a hidden feature I used to find > stubborn or hard to find missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and > then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to > look for a match using all of chr1 as the input to exonerate even when > BLAST finds nothing (this is a very very slow search, but can help pick up > one or two stubborn genes that don?t remap well). To allow this, MAKER > gives exonerate looser matching parameters (i.e. allows for single base > pair introns perhaps caused by assembly errors). The logic here is that > given the fact that I already told MAKER that with some degree of > confidence I expect sequence A to map to to location X, it will try its > hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at > line 1563, but only after a BLAST alignment has already seeded it to the > region (that BLAST result has the information in its description > parameter). MAKER will then ignore seeds completely outside of maker_coor. > In addition any BLAST seeds that overlap maker_coor will get the search > space for alignment polishing adjusted to match maker_coor exactly. Also > match parameters for exonerate will not be relaxed as they were with > est_forward. > > As you can see the behavior, is slightly different (because it?s an > accidental feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading > the code, it seems that I need to supply maker_coor but not gene_id, as > well as the configuration option est_forward for this to work. Any > occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 > right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < > mikael.durling at slu.se> wrote: > > Can this use of maker_coor be used only to hint about the placement of the > ests, without affecting the naming of the final genes? Ie if I have a > database of EST where I have a priori knowledge of their rough placement, > can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > > 26 feb 2014 kl. 01:58 skrev Carson Holt : > > There is a way. It?s not a standard option and it?s undocumented, but > if you add est_forward=1 to the maker_opts.ctl file, then it will do just > that. The option won?t already be there so you?ll have to type it in. > > There is also a feature designed to work with this option. If you add > tags to your fasta headers, those can be used to guide the mapping and > naming. For example, gene_id= will ensure different isoforms > that share a common gene_id get clustered into the same gene, > and maker_coor=chr1:1-10000 in the fasta header will force a particular > sequence to only be mapped against chr1 within the range of 1-10000 bp and > just using maker_coor=chr1 will force it to only be mapped against chr1. > > This is an undocumented way to remap genes onto new assemblies using blast > alignments of earlier transcript or protein annotations as a guide. > > ?Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM > To: > Subject: [maker-devel] Mapping gene names > > Hi, > > I?m annotating a genome using a closely related genome from Genbank, using > the .frn (RNA) and .faa (protein) files from Genbank as evidence to > annotate my genome. I?ve run Maker, and the annotation seems to have worked > well. Is it possible to map the names of the genes from the related species > to my annotation? I see the *map_forward* option, which applies to the > *model_gff* parameter. Is there a similar option for *est* and *protein*? > > *maker_opts.ctl* > > est=NC_123456.frn > protein=NC_123456.faa > est2genome=1 > protein2genome=1 > > Thanks, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 27 18:13:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Feb 2014 18:13:06 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson Sent from my iPhone > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > Cheers, > Shaun > > > >> On 27 February 2014 15:17, Shaun Jackman wrote: >> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >> >> Cheers, >> Shaun >> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>> Yes. That should work as well as an accidental feature. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>> >>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>> >>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>> >>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Shaun Jackman >>>>>>>> Reply-To: Shaun Jackman >>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>> To: >>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>> >>>>>>>> maker_opts.ctl >>>>>>>> >>>>>>>> est=NC_123456.frn >>>>>>>> protein=NC_123456.faa >>>>>>>> est2genome=1 >>>>>>>> protein2genome=1 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Shaun >>>>>>>> >>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Fri Feb 28 03:40:30 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Fri, 28 Feb 2014 10:40:30 +0000 Subject: [maker-devel] maker_coor behaviour Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se> Hi, in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled. I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it. Thanks, Mikael From carsonhh at gmail.com Fri Feb 28 07:09:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 07:09:09 -0700 Subject: [maker-devel] maker_coor behaviour Message-ID: I wouldn?t use those options for standard de novo annotation. There are really other more appropriate thing that should be used instead. Both maker_coor and est_forward are destined to be part of a separate tool that will secretly just be calling MAKER, but will allow me to control what other parameters MAKER sees to avoid certain logic incompatibilities that make sense when mapping entire genes onto a new assembly, but not really for de novo annotation using ESTs. You should instead try modifying these options in the maker_bopts.ctl file ?> pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn= #Blastn eval cutoff bit_blastn= #Blastn bit cutoff depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming high evidence overlap regions en_score_limit= #Exonerate nucleotide percent of maximal score threshold If either blastn or est2genome results disappear, it is because they don?t meet one of these thresholds (blastn results that don?t meet the thresholds but are borderline are kept if exonerate does meet the thresholds, but if exonerate misses a threshold they will be thrown out). That is whey the EST in question gets thrown out and it?s why the blastn result disappears when you try and anchor it with maker_coor. You can visualize everything with a browser when your done. I still recommend the old version of Apollo for this (it?s just easier). You can try and install it using the ?./Build apollo? option from the .../maker/src/ directory, and it will be installed in .../maker/exe/apollo. It requires that you have apache ant installed to do this. Otherwise just download it from the GMOD source forge page and install it manually. Thanks, Carson On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" wrote: >Hi, > >in a previous thread, the maker_coor feature for ETSs was mentioned. I >have been trying it out, without using it for mapping gene names. I have >placed these ESTs by other means, an thought the maker_coor feature would >be a good use of this a priori knowledge. My major problem i try to solve >is that I find that some ESTs where I know where they should be aligned, >are not recruited to that position by maker?s blastn->exonerate method (I >find them on other scaffolds). So I thought maker_coor with the >est_forward behavior (as described) would be a good option to force my >evidence onto the correct position, instead of ending up supporting or >braking other models. However, as soon as I run with maker_coor tagged >est sequences, no est2genome evidence appears in the final gff3 file. The >blastn evidence is there when est_forward is disabled, but as expected, >there is no blastn evidence when est_forward is turned on. It seems >though as the evidence is used, as the QI lines indicate EST support for >both splice sites as well as exon alignments, but I have no way to >visualize and/or evaluate the congruence of evidence and models. Would it >be possible to tweak Maker into outputting the est2genome alignments when >est_forward/maker_coor is used? I couldn?t figure myself where in the >code this was handled. > >I could of course do my own exonerate alignments of these ESTs and feed >them into maker as est_gff, but if maker already has the machinery to to >this, I thought it would be a good idea to use it. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rbharris at uw.edu Fri Feb 28 13:14:55 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Fri, 28 Feb 2014 12:14:55 -0800 Subject: [maker-devel] error in snap training In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > > > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > > > Hey - > > > > I'm trying to train SNAP and am running into errors. I don't have any > EST evidence, just protein. My .gff file reports 10865 genes but when I run > maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, > a ton of overlap_prev_exon errors get written to the screen and then with I > get to the forge step I get an "impossible error5". Any help would be > greatly appreciated. > > > > Thanks! > > Rebecca > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 28 13:22:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 13:22:12 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: If it?s failing both ways I?m thinking this may be SNAP itself. Try these two different versions of SNAP. ?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz and ?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz If they both fail then contact the SNAP development group ?> korflab AT ucdavis DOT edu Thanks, Carson From: Rebecca Harris Date: Friday, February 28, 2014 at 1:14 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] error in snap training Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > >> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: >> > >> > Hey - >> > >> > I'm trying to train SNAP and am running into errors. I don't have any EST >> evidence, just protein. My .gff file reports 10865 genes but when I run >> maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a >> ton of overlap_prev_exon errors get written to the screen and then with I get >> to the forge step I get an "impossible error5". Any help would be greatly >> appreciated. >> > >> > Thanks! >> > Rebecca >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Mon Feb 3 09:31:16 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 3 Feb 2014 10:31:16 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand > what's happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? > Also, you had 29,000 transcripts from cufflinks, but fewer from > MAKER when you gave it the cufflinks data. How many transcripts did > MAKER identify with the cufflinks data? Did you still get more than > the 10,000 transcripts that you found with just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be > affecting it's performance is that it only annotates a gene where > there is both evidence (like your RNA-seq data) and an ab-initio > prediction. If a prediction is unsupported by the evidence, then > MAKER won't annotate a gene and if evidence aligns where there's no > prediction, MAKER won't annotate a gene either. What ab-initio > predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to > a gene model by setting the est2genome option to 1, but that will > usually give you many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data > and with est2genome set to 1, and let us know how that affects the > MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > dhivya arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rebzi87 at gmail.com Tue Feb 4 15:29:41 2014 From: rebzi87 at gmail.com (Rebecca Harris) Date: Tue, 4 Feb 2014 14:29:41 -0800 Subject: [maker-devel] maker output Message-ID: Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Feb 4 15:43:19 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 4 Feb 2014 16:43:19 -0600 Subject: [maker-devel] Fwd: maker annotation with cufflinks output References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Resending this since it didnt make it to the mailing list before. > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 4 15:42:52 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 4 Feb 2014 22:42:52 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished. If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com] Sent: Tuesday, February 04, 2014 3:29 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Tue Feb 4 15:49:46 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Tue, 4 Feb 2014 22:49:46 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: > 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" : > > Hi, > > I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. > These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though Mikael > > Thanks! > > Cheers, > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Feb 4 15:50:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Feb 2014 15:50:10 -0700 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Clusters are notoriously flakey, so maker is restartable (hence the need for the log file). Also since multiple nodes may write simultaneously to the log, they can munge it?s contents. You can rerun maker with the -dsindex flag to regenerate the master_datastore_index.log as well without processing anything else. You can even delete it before rebuilding it if you want to ensure all entries are uniq (run on a single cpus when you do this). Then count the number of FINISHED entries in the log. Thanks, Carson From: Rebecca Harris Date: Tuesday, February 4, 2014 at 3:29 PM To: Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 11:38:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 11:38:50 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn ap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand what's > happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? Also, you had > 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the > cufflinks data. How many transcripts did MAKER identify with the cufflinks > data? Did you still get more than the 10,000 transcripts that you found with > just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be affecting > it's performance is that it only annotates a gene where there is both evidence > (like your RNA-seq data) and an ab-initio prediction. If a prediction is > unsupported by the evidence, then MAKER won't annotate a gene and if evidence > aligns where there's no prediction, MAKER won't annotate a gene either. What > ab-initio predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to a gene > model by setting the est2genome option to 1, but that will usually give you > many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data and with > est2genome set to 1, and let us know how that affects the MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya > arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 12:28:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 19:28:48 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 13:13:57 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 14:13:57 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid > sequences from related organisms and some kind of omnibus source > like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 13:36:26 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 20:36:26 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, , <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Hi Dhivya, In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct. It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data. Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts. The problem that Carson mentioned with the SNAP HMM file is a real possibility also. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: dhivya arasappan [darasappan at gmail.com] Sent: Wednesday, February 05, 2014 1:13 PM To: Daniel Ence Cc: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 13:38:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 13:38:44 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Protein data doesn?t have to be from that closely a related species. This is because genes maintain homology at the amino acid level across even very large evolutionary distances. Having a closer related species just ensures that genome contents are similar (fewer losses/gains relative to each other). And use the entire proteome of at least one related species (just using a database like swiss-prot is not sufficient). Using translated mRNA-seq data will not give you any new information that was not already available from the untranslated sequence. Plus it will introduce the complicating artifacts that mRNA-seq generates into the protein part of the pipeline (gene merging, incorrect assembly, and false calls caused by background transcription). A big gotcha with mRNA-seq is that all of your genome gets transcribed at a low level, not just the genes, so you will always have contamination that does not represent real gene models. Also in the end you really only expect to capture about 50% of the genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will be partial). So using the proteins from another species, is important to improve sensitivity, and fix many of the issues that arise from the noisy nature of mRNA-seq. In fact if you were forced to use only one (either protein evidence or mRNA-seq) you will actually get better annotations from the protein evidence in most cases. You get better annotations when you use both, but if using only one of them, the proteins from another species are better, and noisy mRNA-seq will be the primary source of annotation error. Thanks, Carson From: dhivya arasappan Date: Wednesday, February 5, 2014 at 1:13 PM To: Daniel Ence Cc: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid sequences > from related organisms and some kind of omnibus source like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 22:16:43 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 23:16:43 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com> Thank you both for those explanations. I'll get back to you after I try rerunning maker. Dhivya On Feb 5, 2014, at 2:38 PM, Carson Holt wrote: > Protein data doesn?t have to be from that closely a related > species. This is because genes maintain homology at the amino acid > level across even very large evolutionary distances. Having a > closer related species just ensures that genome contents are similar > (fewer losses/gains relative to each other). And use the entire > proteome of at least one related species (just using a database like > swiss-prot is not sufficient). > > Using translated mRNA-seq data will not give you any new information > that was not already available from the untranslated sequence. Plus > it will introduce the complicating artifacts that mRNA-seq generates > into the protein part of the pipeline (gene merging, incorrect > assembly, and false calls caused by background transcription). A > big gotcha with mRNA-seq is that all of your genome gets transcribed > at a low level, not just the genes, so you will always have > contamination that does not represent real gene models. Also in the > end you really only expect to capture about 50% of the genes with > mRNA-seq (maybe 70% if you are fortunate - and most of those will be > partial). So using the proteins from another species, is important > to improve sensitivity, and fix many of the issues that arise from > the noisy nature of mRNA-seq. In fact if you were forced to use > only one (either protein evidence or mRNA-seq) you will actually get > better annotations from the protein evidence in most cases. You get > better annotations when you use both, but if using only one of them, > the proteins from another species are better, and noisy mRNA-seq > will be the primary source of annotation error. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Wednesday, February 5, 2014 at 1:13 PM > To: Daniel Ence > Cc: Carson Holt , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello Daniel and Carson, > > Thanks for your replies. > > Yes I used the the protein sequences resulting from annotation of > trinity assembly (using trinotate). I'll try using protein > sequences from related species (though there arent sequences from > closely related orgs). Could you tell me a little about why protein > data from annotating my rnaseq data would not work best here? > > Thanks > Dhivya > > On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > >> Hi Dhivya, Are the protein matches in your results coming from your >> annotations of the transcriptome? You should really use amino-acid >> sequences from related organisms and some kind of omnibus source >> like SwissProt. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> From: Carson Holt [carsonhh at gmail.com] >> Sent: Wednesday, February 05, 2014 11:38 AM >> To: dhivya arasappan; Daniel Ence >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Thu Feb 6 04:02:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Thu, 6 Feb 2014 11:02:37 +0000 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: Hi Carson, it?s nice to see all these new features in maker. I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message: ERROR: You found a tRNA with an intron! This should not happen --> rank=12, hostname=my-mgrid6 ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scf_013 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf_013 I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me: scf_013 1 189339 189410 Thr AGT 0 0 82.83 scf_013 2 510381 510462 Ser AGA 0 0 67.09 scf_013 3 586886 587000 Leu CAA 586924 586956 57.97 scf_013 4 942166 942069 Leu AAG 942128 942113 57.48 scf_013 5 169102 168993 Leu TAA 169065 169037 56.49 Hope this can be of some help while debugging. I?ll leave trnascan off for now. thanks, Mikael 10 jan 2014 kl. 22:03 skrev Carson Holt : > Hi Mikael, > > The options are part of the new MAKER-P integration > (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac > t). Additional documentation/tutorials will be forthcoming - probably in > a nice wiki page as part of the upcoming GMOD Malaysia courses in February > or alternatively with the annual GMOD summer school. The tRNA option is > easy enough to turn on (just set trna=1 in the maker_opts.ctl file). > > Thanks, > Carson > > > > On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and other maker developers, >> >> I was reading the source code of the latest maker release and noted >> several references to ncRNAs, snoscan and trnascan. Can these be >> incorporated into the normal annotation workflow? If so, are there any >> instructions available for that? >> >> best regards, >> Mikael Durling >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Thu Feb 6 07:52:12 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 08:52:12 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com> Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? Snippet of my genome.ann file is attached (since its too big for the list) for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.ann Type: application/octet-stream Size: 15761 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.dna Type: application/octet-stream Size: 3075 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 09:01:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:01:04 -0700 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: I?m making a new release this weekend, but if you have access to the devel version, you can test now. All changes have been committed tot he subversion repository. Thanks, Carson On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson, > >it?s nice to see all these new features in maker. > >I gave the trnascan option a try by enabling it in the config file for >one of my fungal genomes. It failed though, with this error message: > >ERROR: You found a tRNA with an intron! This should not happen >--> rank=12, hostname=my-mgrid6 >ERROR: Failed while gathering ab-init output files >ERROR: Chunk failed at level:1, tier_type:2 >FAILED CONTIG:scf_013 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf_013 > >I checked the trnascan output >(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, >and the output seems valid to me: > >scf_013 1 189339 189410 Thr AGT 0 0 >82.83 >scf_013 2 510381 510462 Ser AGA 0 0 >67.09 >scf_013 3 586886 587000 Leu CAA 586924 586956 >57.97 >scf_013 4 942166 942069 Leu AAG 942128 942113 >57.48 >scf_013 5 169102 168993 Leu TAA 169065 169037 >56.49 > > >Hope this can be of some help while debugging. I?ll leave trnascan off >for now. > >thanks, > >Mikael > > >10 jan 2014 kl. 22:03 skrev Carson Holt : > >> Hi Mikael, >> >> The options are part of the new MAKER-P integration >> >>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr >>ac >> t). Additional documentation/tutorials will be forthcoming - probably >>in >> a nice wiki page as part of the upcoming GMOD Malaysia courses in >>February >> or alternatively with the annual GMOD summer school. The tRNA option is >> easy enough to turn on (just set trna=1 in the maker_opts.ctl file). >> >> Thanks, >> Carson >> >> >> >> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and other maker developers, >>> >>> I was reading the source code of the latest maker release and noted >>> several references to ncRNAs, snoscan and trnascan. Can these be >>> incorporated into the normal annotation workflow? If so, are there any >>> instructions available for that? >>> >>> best regards, >>> Mikael Durling >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Thu Feb 6 09:05:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:05:05 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Your genome.dna file has no sequence? Did you by any chance strip the fasta sequence from the GFF3 you are using as input to maker2zff? There should be fasta sequence at the end of that file. Also can I see the GFF3 file you are using as input to maker2zff. Thanks, Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 7:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? My genome.ann file is attached for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 10:04:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 10:04:25 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: Could you give me the file without using 'head? to trim it, its cutting it before it reaches the part I?m interested in. ?Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 10:01 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip the fasta > sequence from the GFF3 you are using as input to maker2zff? There should be > fasta sequence at the end of that file. Also can I see the GFF3 file you are > using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data in it. > However, the SNAP steps after that have created empty files. The following > are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors like this > for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds > exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds > exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds > exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds > exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds > exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds > exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds > exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all showing up as > errors. I realize this may be an issue with snap, but are you familiar with > anything like this? My genome.ann file is attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? We?ve had >> a couple of recent posts where after training snap was giving no results, and >> as a result maker couldn?t give any genes. One cause of something like that >> may be your step 2. Make sure the ZFF wasn?t empty you used to train with. >> The maker2zff script uses filters to only put the best genes in the off file, >> and if all your genes fail the filtering then you are training with an empty >> ZFF. >> >> Also you should use proteins from a related species as your protein file. I >> see that you protein marches are varying wildly from run to run? So is your >> contig count? Were the subset of contigs you have results for long enough to >> contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to >> annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks like this >> (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of which there >> are 29,000 transcripts). I used the protein sequences from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks like >> this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than expected, so I >> went the cufflinks route. I dont understand why when using the cufflinks >> transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then >> used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna >> p/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the genome was >> more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand what's >>> happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >>> the cufflinks data. How many transcripts did MAKER identify with the >>> cufflinks data? Did you still get more than the 10,000 transcripts that you >>> found with just the Trinity data? >>> >>> A key part of MAKER's approach to genome annotation that might be affecting >>> it's performance is that it only annotates a gene where there is both >>> evidence (like your RNA-seq data) and an ab-initio prediction. If a >>> prediction is unsupported by the evidence, then MAKER won't annotate a gene >>> and if evidence aligns where there's no prediction, MAKER won't annotate a >>> gene either. What ab-initio predictors are you using and have they been >>> trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments to a gene >>> model by setting the est2genome option to 1, but that will usually give you >>> many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks data and with >>> est2genome set to 1, and let us know how that affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >>> arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Feb 6 10:01:44 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 11:01:44 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip > the fasta sequence from the GFF3 you are using as input to > maker2zff? There should be fasta sequence at the end of that file. > Also can I see the GFF3 file you are using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org > " > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data > in it. However, the SNAP steps after that have created empty files. > The following are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors > like this for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds > exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds > exon-2:out_of_bounds exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds > exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds > exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds > exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds > exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds > exon-20:out_of_bounds exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all > showing up as errors. I realize this may be an issue with snap, but > are you familiar with anything like this? My genome.ann file is > attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.cat.formatted.gff Type: application/octet-stream Size: 19905 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 6 17:22:57 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Feb 2014 16:22:57 -0800 Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation Message-ID: Hi MAKER developers, I?d like to add MAKER to Homebrew to make the installation of MAKER and its dependencies as straight forward as brew install maker. Homebrew is a system for installing software, originally developed for Mac OS, and now also for Linux through Linuxbrew. Homebrew/science is a collection of scientific software, which includes a lot of bioinformatics software. I?ve created a prototype for the MAKER installation script(called a formula, in Homebrew parlance). Is there a static URL for the source code of MAKER? The current formula won?t work out of the box, because part of the URLdepends on the user?s unique ID: http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz. Would you be interested in adding MAKER to Homebrew? I know MAKER must be licensed for commercial use. It is possible for Homebrew to display a notice of the MAKER license when it?s installed. MAKER is not available for commercial use without a license. Those wishing to license MAKER for commercial use should contact Beth Drees at the University of Utah TCO to discuss your needs. Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Feb 7 06:29:27 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 7 Feb 2014 08:29:27 -0500 Subject: [maker-devel] NCBI feature table Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Hello Maker Developers, I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. Cheers Ian From mike.thon at gmail.com Fri Feb 7 07:14:06 2014 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 7 Feb 2014 15:14:06 +0100 Subject: [maker-devel] NCBI feature table In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Hi Ian - We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format. However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits. From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess. The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app. In that case you'll need to convert the output of that app into .tbl format. > On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics wrote: > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cexzurjimenezjr at gmail.com Thu Feb 6 22:27:13 2014 From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.) Date: Fri, 7 Feb 2014 13:27:13 +0800 Subject: [maker-devel] Testing MAKER After Installation Message-ID: Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1502 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1320 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4541 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:11:44 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:11:44 +0000 Subject: [maker-devel] Maker installation In-Reply-To: References: Message-ID: Hi Tracy, The older apollo is pretty much deprecated. There are still people who like to use it though (myself among them). You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/. If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz You can also use Web-Apollo for your data if you want, and that is what I would recommend. On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that. The GUI is really only for demonstration purposes or very small datasets. It is not for production (that is why it is off by default). Thanks, Carson From: Tracy Smith > Date: Friday, February 7, 2014 at 10:48 AM To: Carson Holt > Cc: > Subject: Maker installation Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip" but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:28:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:28:29 +0000 Subject: [maker-devel] NCBI feature table In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Message-ID: Yes. The non-web version of apollo can open GFF3 and then save to table format ?> http://sourceforge.net/projects/gmod/files/Apollo/ I?ve also attached a script made by a lab member that can convert MAKER derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts author (Michael Campbell) incase you have any questions. Thanks, Carson On 2/7/14, 7:14 AM, "Michael Thon" wrote: >Hi Ian - > >We've been struggling with this too and I started developing a script to >convert the maker gff into ncbi's .tbl format. However we found that >some of the gene models required manual editing so what we do is import >the gff into a commercial application called Geneious where we do the >edits. From there we export the data in genbank format and then convert >it to .tbl format with a script. Our submission just passed the automated >checks and we're waiting for the manual review. Probably none of my code >will help you, and in any case its kind of a mess. The only advice I can >offer is to say that you'll probably need some manual editing in your >workflow, if not Apollo, then some other app. In that case you'll need >to convert the output of that app into .tbl format. > >> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics >> wrote: >> >> Hello Maker Developers, >> >> I have used this software with great success and I continue to look to >>it going forward. However, as I?m getting ready to submit my annotations >>to NCBI with the genomes I haven?t found a straightforward method of >>turning the MAKER produced GFF files into a NCBI feature table. What is >>the process for creating this table? It seem that the format NCBI is >>looking for is unique and I haven?t uncovered any scripts or tools to >>assist in the creation of this table from my annotation files. If anyone >>has any insight on this issue it would be greatly appreciated. >> >> Cheers >> Ian >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff32table Type: application/octet-stream Size: 7511 bytes Desc: gff32table URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:31:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:31:17 +0000 Subject: [maker-devel] Testing MAKER After Installation In-Reply-To: References: Message-ID: That can happen on some systems with that very old version of MAKER. Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html Thanks, Carson From: "Cexzur Jimenez Jr." > Date: Thursday, February 6, 2014 at 10:27 PM To: > Subject: [maker-devel] Testing MAKER After Installation Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhall7 at hawaii.edu Fri Feb 7 17:31:36 2014 From: bhall7 at hawaii.edu (Brian Hall) Date: Fri, 07 Feb 2014 14:31:36 -1000 Subject: [maker-devel] NCBI feature table In-Reply-To: References: Message-ID: <52F57AE8.5090002@hawaii.edu> Hi Ian, My colleagues are also working on preparing a genome for submission to the NCBI. The software we are developing for this task is still a work in progress, but you are welcome to give it a try: https://github.com/tedsta/GAG It's a console-based application and it requires Python 2.6. Its strength is in filtering and modifying large segments of the genome at once -- where Apollo is good for removing a few erroneous exons, we are dealing with lists of dozens or more. This program seeks to make such changes as painless as possible. My advice is to try the simplest gff3-to-tbl script you can find and then run tbl2asn. If it works out okay, great! If you get a massive error report, get in touch and we'll help you out if we can :) --Brian On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote: > Date: Fri, 7 Feb 2014 08:29:27 -0500 > From: UMD Bioinformatics > To: maker-devel at yandell-lab.org > Subject: [maker-devel] NCBI feature table > Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com> > Content-Type: text/plain; charset=windows-1252 > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > From tmsmith23 at wisc.edu Fri Feb 7 10:48:13 2014 From: tmsmith23 at wisc.edu (Tracy Smith) Date: Fri, 7 Feb 2014 11:48:13 -0600 Subject: [maker-devel] Maker installation Message-ID: Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip " but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 08:34:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 08:34:58 -0700 Subject: [maker-devel] MAKER presentation at PAG In-Reply-To: References: Message-ID: * * maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts following the NCBI suggested naming format. * map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER fasta files. * map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3 files, old IDs/Names are mapped to to the Alias attribute. * maker_functional_fasta - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced transcript and protein fasta files. * maker_functional_gff - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced GFF3 files in the Note attribute * ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file. This is meta data that shows up when you click on an annotation in JBrowse /GBrowse. * iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3 features representing domains. Interesting tier for GBrowse. These are visible features tracks that can be seen in JBrowse/GBrowse. Thanks, Carson From: Kevin Dorn Date: Sunday, February 9, 2014 at 9:23 PM To: Subject: MAKER presentation at PAG Hi Carson, I saw your MAKER presentation at PAG this year and have a quick question. I've used MAKER to annotate the plant genome we're working on, and am mostly done. I had to step out for a second during your talk, and when I came back, you were talking about how you can transfer meaningful annotations (getting rid of the 'ugly MAKER names' for genes). Is there an accessory script to do this? Thanks, Kevin Dorn -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitha at ccmb.res.in Mon Feb 10 00:04:37 2014 From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR) Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST) Subject: [maker-devel] Falied to create new account In-Reply-To: Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Hi, I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. The error mentioned is: Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. at /var/www/cgi-bin/mwas/maker.cgi line 21. Kindly help me through the registration asap. Thanks Amitha. From listona at science.oregonstate.edu Sat Feb 8 19:08:42 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Sat, 08 Feb 2014 18:08:42 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron From caigh02 at gmail.com Sun Feb 9 20:26:57 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Sun, 9 Feb 2014 21:26:57 -0600 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 10:25:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:25:06 -0700 Subject: [maker-devel] Falied to create new account In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Message-ID: The smtp server that sends e-mails out is just down. So when you said you forgot your login, it couldn?t e-mail you. I switched to a different server for the time being. ?Carson On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" wrote: >Hi, > >I an interested in using Maker online version, for which i tried to >create a profile using the email id 'amitha at ccmb.res.in', but >unfortunately, I did not successfully login. >I am also pasting a link of the error here, >http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. > >The error mentioned is: >Error executing run mode 'forgot_login': Can't call method "MailMsg" >without a package or object reference at >/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. > at /var/www/cgi-bin/mwas/maker.cgi line 21. > >Kindly help me through the registration asap. > >Thanks >Amitha. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 10 10:26:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:26:06 -0700 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: 1. yes. It should take NCBI BLAST+ results. 2. It has to be UniProt/Swissprot or you can modify the comments of another database to look like UniProt/Swissport 3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature (or at least it could last time I tested it - which was quite a long time ago). Thanks, Carson From: Guohong Cai Date: Sunday, February 9, 2014 at 8:26 PM To: Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 12:21:31 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:21:31 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From listona at science.oregonstate.edu Mon Feb 10 12:46:06 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Mon, 10 Feb 2014 11:46:06 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron From: Barry Moore [mailto:barry.utah at gmail.com] Sent: Monday, February 10, 2014 11:22 AM To: Aaron Liston Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-using repeat masking in SNAP training Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 12:56:26 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:56:26 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com> Yep. If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run. B On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote: > Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron > > From: Barry Moore [mailto:barry.utah at gmail.com] > Sent: Monday, February 10, 2014 11:22 AM > To: Aaron Liston > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Re-using repeat masking in SNAP training > > Hi Arron, > > If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? > > Barry > > > On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > > > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 11:37:36 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 18:37:36 +0000 Subject: [maker-devel] Falied to create new account In-Reply-To: References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> , Message-ID: Hossein, Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 11:20 AM To: Daniel Ence Subject: Re: [maker-devel] Falied to create new account Hi Daniel I running it through the local server at my work M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-11 12:16 PM, "Daniel Ence" wrote: >Hi Hossein, > >Did you encounter this error while you were running MAKER on your local >machine or through the MAKER web annotation service? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Carson Holt [carsonhh at gmail.com] >Sent: Tuesday, February 11, 2014 10:18 AM >To: Daniel Ence >Cc: Mark Yandell >Subject: FW: [maker-devel] Falied to create new account > >Hey Daniel could you download his dataset, and see if you can replicate >the error. Also check if this was an MWAS job or a local maker run (his >dataset will already be there for MWAS, you just need the job ID). > >Thanks, >Carson > >On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: > >>Hi Carson >> >> >>I encountered this error while running maker >> >>FATAL ERROR >>ERROR: Failed while processing the chunk divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00006 >> >> >> >> >> >>HB >> >> >> >> >> >> >> >>> >> > > From darasappan at gmail.com Tue Feb 11 11:48:23 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 11 Feb 2014 12:48:23 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join > the GFF3?s without stripping out the fasta sequence at the end when > training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by > the mailing list. > > Here is the whole file: > > > cat.formatted.gff.tgz > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt > wrote: >> Could you give me the file without using 'head? to trim it, its >> cutting it before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org >> " >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file >> and used that as my input. I will rerun snap with the gff file >> containing the sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to >> maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip >>> the fasta sequence from the GFF3 you are using as input to >>> maker2zff? There should be fasta sequence at the end of that >>> file. Also can I see the GFF3 file you are using as input to >>> maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org >>> " >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has >>> data in it. However, the SNAP steps after that have created empty >>> files. The following are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get >>> errors like this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds >>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds >>> exon-2:out_of_bounds exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds >>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds >>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds >>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds >>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds >>> exon-20:out_of_bounds exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all >>> showing up as errors. I realize this may be an issue with snap, >>> but are you familiar with anything like this? My genome.ann file >>> is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step >>>> 3? We?ve had a couple of recent posts where after training snap >>>> was giving no results, and as a result maker couldn?t give any >>>> genes. One cause of something like that may be your step 2. >>>> Make sure the ZFF wasn?t empty you used to train with. The >>>> maker2zff script uses filters to only put the best genes in the >>>> off file, and if all your genes fail the filtering then you are >>>> training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your >>>> protein file. I see that you protein marches are varying wildly >>>> from run to run? So is your contig count? Were the subset of >>>> contigs you have results for long enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used >>>> trinotate to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of >>>> which there are 29,000 transcripts). I used the protein >>>> sequences from trinity assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than >>>> expected, so I went the cufflinks route. I dont understand why >>>> when using the cufflinks transcripts, even less genes are being >>>> found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train >>>> SNAP. I then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >>>> maker_mpi_withAlltrinity/snap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the >>>> genome was more than 90% complete. Any suggestions would be >>>> helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand >>>>> what's happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? >>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>>>> MAKER when you gave it the cufflinks data. How many transcripts >>>>> did MAKER identify with the cufflinks data? Did you still get >>>>> more than the 10,000 transcripts that you found with just the >>>>> Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might >>>>> be affecting it's performance is that it only annotates a gene >>>>> where there is both evidence (like your RNA-seq data) and an ab- >>>>> initio prediction. If a prediction is unsupported by the >>>>> evidence, then MAKER won't annotate a gene and if evidence >>>>> aligns where there's no prediction, MAKER won't annotate a gene >>>>> either. What ab-initio predictors are you using and have they >>>>> been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments >>>>> to a gene model by setting the est2genome option to 1, but that >>>>> will usually give you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks >>>>> data and with est2genome set to 1, and let us know how that >>>>> affects the MAKER results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on >>>>> behalf of dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a >>>>> very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran >>>>> maker >>>>> using my genome assembly and the trinity results. I did not get >>>>> as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome >>>>> assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>>>> using my >>>>> genome assembly and the cufflinks result. I get much less >>>>> number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel >>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 11 11:55:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Feb 2014 11:55:38 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> Message-ID: I wouldn?t use mpi_evaluator. It is buggy and has virtually no documentation. The AED values are the best way to identify which genes are higher and lower quality. You can also run interproscan to identify protein domain content as an independent evaluation. Look at this paper here ?> http://www.biomedcentral.com/1471-2105/12/491 Figure 4 has a nice example of how AED, domain content, and gene orthology correlate to show the quality of different subsets of genes in seven ant genomes. If you choose to try mpi_evaluator it uses the -CTL option to generate empty files that you then fill in. Thanks, Carson From: dhivya arasappan Date: Tuesday, February 11, 2014 at 11:48 AM To: Carson Holt Cc: Daniel Ence , Subject: Re: [maker-devel] maker annotation with cufflinks output With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join the GFF3?s > without stripping out the fasta sequence at the end when training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by the mailing > list. > > Here is the whole file: > > > cat.formatted.gff.tgz > b> > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt wrote: >> Could you give me the file without using 'head? to trim it, its cutting it >> before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >> >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file and used >> that as my input. I will rerun snap with the gff file containing the >> sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip the fasta >>> sequence from the GFF3 you are using as input to maker2zff? There should be >>> fasta sequence at the end of that file. Also can I see the GFF3 file you >>> are using as input to maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has data in it. >>> However, the SNAP steps after that have created empty files. The following >>> are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get errors like >>> this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds >>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds >>> exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds >>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds >>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds >>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds >>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds >>> exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up >>> as errors. I realize this may be an issue with snap, but are you familiar >>> with anything like this? My genome.ann file is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step 3? We?ve >>>> had a couple of recent posts where after training snap was giving no >>>> results, and as a result maker couldn?t give any genes. One cause of >>>> something like that may be your step 2. Make sure the ZFF wasn?t empty you >>>> used to train with. The maker2zff script uses filters to only put the best >>>> genes in the off file, and if all your genes fail the filtering then you >>>> are training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your protein file. >>>> I see that you protein marches are varying wildly from run to run? So is >>>> your contig count? Were the subset of contigs you have results for long >>>> enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate >>>> to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of which >>>> there are 29,000 transcripts). I used the protein sequences from trinity >>>> assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than expected, so I >>>> went the cufflinks route. I dont understand why when using the cufflinks >>>> transcripts, even less genes are being found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I >>>> then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s >>>> nap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the genome >>>> was more than 90% complete. Any suggestions would be helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand what's >>>>> happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave >>>>> it the cufflinks data. How many transcripts did MAKER identify with the >>>>> cufflinks data? Did you still get more than the 10,000 transcripts that >>>>> you found with just the Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might be >>>>> affecting it's performance is that it only annotates a gene where there is >>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a >>>>> prediction is unsupported by the evidence, then MAKER won't annotate a >>>>> gene and if evidence aligns where there's no prediction, MAKER won't >>>>> annotate a gene either. What ab-initio predictors are you using and have >>>>> they been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments to a gene >>>>> model by setting the est2genome option to 1, but that will usually give >>>>> you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks data and >>>>> with est2genome set to 1, and let us know how that affects the MAKER >>>>> results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>>>> using my genome assembly and the trinity results. I did not get as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>>>> genome assembly and the cufflinks result. I get much less number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 13:52:05 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 20:52:05 +0000 Subject: [maker-devel] New MAKER release Message-ID: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 14:19:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 21:19:17 +0000 Subject: [maker-devel] New MAKER release In-Reply-To: References: Message-ID: URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL. Thanks, Carson From: Joanna Kelley > Date: Tuesday, February 11, 2014 at 2:00 PM To: Carson Holt > Subject: Re: [maker-devel] New MAKER release Hi Carson, The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that? Thanks, Joanna Downloading RepBase... --2014-02-11 12:59:38-- http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz Resolving www.girinst.org... 66.201.49.247 Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2014-02-11 12:59:38 ERROR 404: Not Found. ERROR: Failed installing RepBase, now cleaning installation path... You may need to install RepBase manually. On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt > wrote: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Please update your address book, my new email address is joanna.l.kelley at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 15:59:57 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 22:59:57 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: Message-ID: Hi Hossen, I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 3:51 PM To: Daniel Ence Subject: ERROR: Failed while processing the chunk divide!! Dear Daniel I re-started maker and it is still running. But in error our file that has been generated so far it seems that smaller conitgs are affected. There are contigs of 2-4 kb with this error but also I noticed a contig of 30kb length having this error I was wondering if I need to change the setting in the maker_opt file #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) If I understand correctly max_dna_len divide conitgs of over 100kb to smaller chucks. However it is not clear to me that for the min_contig option if the default contig length is 10kb or less, then why I have error message for 30kb long contigs. Should I change this to 0 Here is an example of the error message for one of the contigs #--------- command -------------# Widget::exonerate::est2genome: /usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 /17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta -t /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136. fasta -Q dna -T dna --model est2genome --minintron 20 --showcigar --percent 20 > /raid01/projects/Plasmodiophora/brassica e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S c00001.235-1136.comp14545_c0_seq1.est_exonerate #-------------------------------# cleaning blastn... cleaning tblastx... cleaning blastx... ERROR: Failed on PbPT3Sc00001_S_0.8_1-mRNA-1 Check your input GFF3 file for errors! (from GFFDB) FATAL ERROR ERROR: Failed while processing the chunk divide!! ERROR: Chunk failed at level 17 !! FAILED CONTIG:PbPT3Sc00001 --Next Contig-- Regards HB On 14-02-11 12:37 PM, "Daniel Ence" wrote: >Hossein, > >Ok. So since this error came up on a local install, I'm going to need >some more information to understand what went wrong. Is it the same >contig that always causes this error? If it is, then is the the only >error or warning that MAKER encounters while running on this contig? Or, >if multiple contigs fail, then is it always the same error? > >If you can narrow it down to the smallest possible dataset that >consistently gives the same error, then we canb egin to understand what's >wrong. > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 11:20 AM >To: Daniel Ence >Subject: Re: [maker-devel] Falied to create new account > >Hi Daniel > >I running it through the local server at my work > > > > > > >M. Hossein Borhan, Ph.D. >Research Scientist/ Chercheur Scientifique >Saskatoon Research Centre/Centre de Recherches de Saskatoon >Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >107 Science Place, Saskatoon, SK.,S7N 0X2 >Telephone/T?l?phone: (306) 385-9441 >Facsimile/T?l?copieur: (306) 385-9482 >Hossein.borhan at agr.gc.ca > > > > > > > > >On 14-02-11 12:16 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>Did you encounter this error while you were running MAKER on your local >>machine or through the MAKER web annotation service? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Carson Holt [carsonhh at gmail.com] >>Sent: Tuesday, February 11, 2014 10:18 AM >>To: Daniel Ence >>Cc: Mark Yandell >>Subject: FW: [maker-devel] Falied to create new account >> >>Hey Daniel could you download his dataset, and see if you can replicate >>the error. Also check if this was an MWAS job or a local maker run (his >>dataset will already be there for MWAS, you just need the job ID). >> >>Thanks, >>Carson >> >>On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: >> >>>Hi Carson >>> >>> >>>I encountered this error while running maker >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00006 >>> >>> >>> >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>>> >>> >> >> > From marc.hoeppner at imbim.uu.se Wed Feb 12 01:34:12 2014 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Wed, 12 Feb 2014 09:34:12 +0100 Subject: [maker-devel] Annotations from protein alignments Message-ID: <52FB3204.60606@imbim.uu.se> Dear list, I have an annotation project with both protein data (it's a bird, so I've been using both vertebrates in general and chicken in specific), and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq data. The chicken augustus model seems to do a decent job in seeding gene loci, but it's not quite perfect. I want to use protein alignments to create a high-confidence set of exons and subsequently a set of gene loci to train e.g. snap), but when testing to set protein2genome=1 I never get any annotations. This is also true for the test data set that is delivered together with Maker (hsap_). Anything I should know about the use of proteins to generate annotations? I left all settings in the config file at their defaults (except protein2genome=1). I've tried this with both Maker 2.30 and 2.31. All the best, Marc -- ----------- Marc P. Hoeppner, PhD Group leader BILS Genome annotation platform Department of Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoepner at imbim.uu.se From carsonhh at gmail.com Wed Feb 12 08:42:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 12 Feb 2014 08:42:36 -0700 Subject: [maker-devel] Annotations from protein alignments In-Reply-To: <52FB3204.60606@imbim.uu.se> References: <52FB3204.60606@imbim.uu.se> Message-ID: I updated the 2.31 tar ball. Go ahead and download it again. protein2genome was turned off for eukaryotes and only working for prokaryotic genomes. ?Carson On 2/12/14, 1:34 AM, "Marc P. Hoeppner" wrote: >Dear list, > >I have an annotation project with both protein data (it's a bird, so >I've been using both vertebrates in general and chicken in specific), >and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq >data. The chicken augustus model seems to do a decent job in seeding >gene loci, but it's not quite perfect. I want to use protein alignments >to create a high-confidence set of exons and subsequently a set of gene >loci to train e.g. snap), but when testing to set protein2genome=1 I >never get any annotations. This is also true for the test data set that >is delivered together with Maker (hsap_). Anything I should know about >the use of proteins to generate annotations? I left all settings in the >config file at their defaults (except protein2genome=1). I've tried this >with both Maker 2.30 and 2.31. > >All the best, > >Marc > >-- >----------- >Marc P. Hoeppner, PhD >Group leader >BILS Genome annotation platform > >Department of Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoepner at imbim.uu.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 11:59:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 18:59:11 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > From dence at genetics.utah.edu Wed Feb 12 12:15:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 19:15:59 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , , Message-ID: Hi Hossein, One more question. How did you make the gff3 that you're passing through here? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu] Sent: Wednesday, February 12, 2014 11:59 AM To: Borhan, Hossein Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!! Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 13:42:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 20:42:03 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 12:55 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Hi Daniel I am using maker 2.10 I also checked the naming of the scaffold in the genome file and the gff file for the failed example. Naming is the same Thanks Hossein M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-12 1:30 PM, "Daniel Ence" wrote: >Hi Hossein, > >And which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 12:25 PM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Hi Daniel > >Gff file was generated by the 1st run of maker > > > >HB > > > > > > > >On 14-02-12 1:15 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>One more question. How did you make the gff3 that you're passing through >>here? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Daniel Ence [dence at genetics.utah.edu] >>Sent: Wednesday, February 12, 2014 11:59 AM >>To: Borhan, Hossein >>Cc: maker-devel at yandell-lab.org >>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk >>divide!! >> >>Hi Hossein, >> >>So, after looking at the gff3 and your control files, I had an idea. >>There's the part of the control file called "Re-annotation Using MAKER >>Derived GFF3", but you can also passthrough features from a gff3 using >>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. >> >>Sometimes we encounter problems with the MAKER passthrough. Could you try >>dividing the gff3 file into the different feature sources and passing it >>through the "est_gff" etc options and not with the MAKER passthrough? >>That will tell us if the problem is with the gff3 file or with how MAKER >>is processing it. >> >>Another also to check is to make sure that the contig names in the gff3 >>file match the contig names in the fasta file that you're annotating. >> >>Thanks, >>Daniel >> >> >> >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Wednesday, February 12, 2014 8:49 AM >>To: Daniel Ence >>Subject: Re: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >> >>I have generated the files that you requested. I choose Sc00009 from my >>genome which is 30 kb and was one of the scaffolds coming up with error. >>In addition to Ctl files and error output file I also attached a part of >>the gff file related to SC00009 that is indicated in the error message. >> >> >>Thanks for helping with this >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 4:59 PM, "Daniel Ence" wrote: >> >>>Hi Hossen, >>> >>>I think that what would be the most help right now is if you ran MAKER >>>on >>>only one of those contigs that are failing and send me the entire error >>>output along with the maker control files that you are using. It looks >>>like the error is coming from the gff3 files that you are using as >>>input. >>> >>>Thanks, >>>Daniel >>> >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 3:51 PM >>>To: Daniel Ence >>>Subject: ERROR: Failed while processing the chunk divide!! >>> >>>Dear Daniel >>> >>>I re-started maker and it is still running. But in error our file that >>>has >>>been generated so far it seems that smaller conitgs are affected. There >>>are contigs of 2-4 kb with this error but also I noticed a contig of >>>30kb >>>length having this error >>> >>>I was wondering if I need to change the setting in the maker_opt file >>> >>>#-----MAKER Behavior Options >>>max_dna_len=100000 #length for dividing up contigs into chunks >>>(increases/decreases memory usage) >>>min_contig=1 #skip genome contigs below this length (under 10kb are >>>often >>>useless) >>> >>> >>>If I understand correctly max_dna_len divide conitgs of over 100kb to >>>smaller chucks. However it is not clear to me that for the min_contig >>>option if the default contig length is 10kb or less, then why I have >>>error >>>message for 30kb long contigs. Should I change this to 0 >>> >>>Here is an example of the error message for one of the contigs >>> >>> >>>#--------- command -------------# >>>Widget::exonerate::est2genome: >>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>>-t >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11 >>>3 >>>6 >>>. >>>fasta >>>-Q dna -T dna --model est2genome >>>--minintron 20 --showcigar --percent 20 > >>>/raid01/projects/Plasmodiophora/brassica >>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra >>>s >>>s >>>i >>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP >>>T >>>3 >>>S >>>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>>#-------------------------------# >>>cleaning blastn... >>>cleaning tblastx... >>>cleaning blastx... >>>ERROR: Failed on >>>PbPT3Sc00001_S_0.8_1-mRNA-1 >>>Check your input GFF3 file for errors! >>>(from GFFDB) >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk >>>divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00001 >>> >>> >>> >>> >>>--Next Contig-- >>> >>> >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >>> >>>>Hossein, >>>> >>>>Ok. So since this error came up on a local install, I'm going to need >>>>some more information to understand what went wrong. Is it the same >>>>contig that always causes this error? If it is, then is the the only >>>>error or warning that MAKER encounters while running on this contig? >>>>Or, >>>>if multiple contigs fail, then is it always the same error? >>>> >>>>If you can narrow it down to the smallest possible dataset that >>>>consistently gives the same error, then we canb egin to understand >>>>what's >>>>wrong. >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>>Sent: Tuesday, February 11, 2014 11:20 AM >>>>To: Daniel Ence >>>>Subject: Re: [maker-devel] Falied to create new account >>>> >>>>Hi Daniel >>>> >>>>I running it through the local server at my work >>>> >>>> >>>> >>>> >>>> >>>> >>>>M. Hossein Borhan, Ph.D. >>>>Research Scientist/ Chercheur Scientifique >>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>>Telephone/T?l?phone: (306) 385-9441 >>>>Facsimile/T?l?copieur: (306) 385-9482 >>>>Hossein.borhan at agr.gc.ca >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>>> >>>>>Hi Hossein, >>>>> >>>>>Did you encounter this error while you were running MAKER on your >>>>>local >>>>>machine or through the MAKER web annotation service? >>>>> >>>>>Thanks, >>>>>Daniel >>>>> >>>>> >>>>>Daniel Ence >>>>>Graduate Student >>>>>Eccles Institute of Human Genetics >>>>>University of Utah >>>>>15 North 2030 East, Room 2100 >>>>>Salt Lake City, UT 84112-5330 >>>>>________________________________________ >>>>>From: Carson Holt [carsonhh at gmail.com] >>>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>>To: Daniel Ence >>>>>Cc: Mark Yandell >>>>>Subject: FW: [maker-devel] Falied to create new account >>>>> >>>>>Hey Daniel could you download his dataset, and see if you can >>>>>replicate >>>>>the error. Also check if this was an MWAS job or a local maker run >>>>>(his >>>>>dataset will already be there for MWAS, you just need the job ID). >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Hi Carson >>>>>> >>>>>> >>>>>>I encountered this error while running maker >>>>>> >>>>>>FATAL ERROR >>>>>>ERROR: Failed while processing the chunk divide!! >>>>>> >>>>>>ERROR: Chunk failed at level 17 >>>>>>!! >>>>>>FAILED CONTIG:PbPT3Sc00006 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From masa at bioinfo.hr Thu Feb 13 03:17:11 2014 From: masa at bioinfo.hr (Masa Roller) Date: Thu, 13 Feb 2014 11:17:11 +0100 Subject: [maker-devel] SNAP scores and AED scores Message-ID: <52FC9BA7.6060505@bioinfo.hr> Dear all, I ran snap2 based gene prediction through maker. In the resulting gff file, in the source "snap_masked" I can find the score in the score column of every snap prediction that did not get promoted to a maker gene. This would be the score of how well the prediction matches the HMM? It seems to me that those snap models that are given gene status no longer appear as snap_masked source but only as source "maker". Maker then removes the score column, instead giving AED and eAED scores (which are more about how the model corresponds to the evidence). When viewing the maker transcripts and SNAP predictions in a browser, they do not match (mostly, maker predictions are longer). I am interested in the score of individual gene predictions that underlined maker gene models. Where could I find that information? Many thanks! From carsonhh at gmail.com Thu Feb 13 13:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:11:22 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: <52FC9BA7.6060505@bioinfo.hr> References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: No. Snap genes do not disappear. All SNAP ab initio calls will always be kept as reference fetters marked snap_masked (for repeat masked genome) and snap (for unmasked genome). MAKER then runs SNAP another time where it feeds hints to SNAP based on EST and protein alignment evidence. These hint based models can then compete against the ab initio SNAP models to be promoted to genes if their AED scores are better. Fianl models can also get UTR added based on EST evidence. That is why you can get models from MAKER that do not match the original SNAP ab initio calls. So in summary, all SNAP ab initio models will be in snap_masked. The MAKER models will consist of hint based SNAP rerun plus SNAP ab intio models processed to add UTR. Thanks, Carson On 2/13/14, 3:17 AM, "Masa Roller" wrote: >Dear all, > >I ran snap2 based gene prediction through maker. > >In the resulting gff file, in the source "snap_masked" I can find the >score in the score column of every snap prediction that did not get >promoted to a maker gene. This would be the score of how well the >prediction matches the HMM? > >It seems to me that those snap models that are given gene status no >longer appear as snap_masked source but only as source "maker". Maker >then removes the score column, instead giving AED and eAED scores (which >are more about how the model corresponds to the evidence). When viewing >the maker transcripts and SNAP predictions in a browser, they do not >match (mostly, maker predictions are longer). > >I am interested in the score of individual gene predictions that >underlined maker gene models. Where could I find that information? > >Many thanks! > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Feb 13 13:23:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:23:07 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: On a side note. Because the MAKER models involve modifying either the ab initio SNAP model or manipulating the underlying scoring scheme using hints, the SNAP score on those is virtually meaningless. However Ian Korf has developed a tool that can take any gene structure and reverse generate a score (i.e. what would the score of this gene have been if SNAP would have called it that way in the first place). I believe the tool is called fathom and is part of the SNAP package. It is not well documented, so you might have to contact Ian Korf directly for that. You can use the maker2zff tool to generate the input to fathom. Thanks, Carson On 2/13/14, 1:11 PM, "Carson Holt" wrote: >No. Snap genes do not disappear. All SNAP ab initio calls will always be >kept as reference fetters marked snap_masked (for repeat masked genome) >and snap (for unmasked genome). MAKER then runs SNAP another time where >it feeds hints to SNAP based on EST and protein alignment evidence. These >hint based models can then compete against the ab initio SNAP models to be >promoted to genes if their AED scores are better. Fianl models can also >get UTR added based on EST evidence. That is why you can get models from >MAKER that do not match the original SNAP ab initio calls. > >So in summary, all SNAP ab initio models will be in snap_masked. The >MAKER models will consist of hint based SNAP rerun plus SNAP ab intio >models processed to add UTR. > >Thanks, >Carson > > > >On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >>Dear all, >> >>I ran snap2 based gene prediction through maker. >> >>In the resulting gff file, in the source "snap_masked" I can find the >>score in the score column of every snap prediction that did not get >>promoted to a maker gene. This would be the score of how well the >>prediction matches the HMM? >> >>It seems to me that those snap models that are given gene status no >>longer appear as snap_masked source but only as source "maker". Maker >>then removes the score column, instead giving AED and eAED scores (which >>are more about how the model corresponds to the evidence). When viewing >>the maker transcripts and SNAP predictions in a browser, they do not >>match (mostly, maker predictions are longer). >> >>I am interested in the score of individual gene predictions that >>underlined maker gene models. Where could I find that information? >> >>Many thanks! >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From barry.utah at gmail.com Thu Feb 13 13:27:17 2014 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 13 Feb 2014 13:27:17 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu> Hi Masa, Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore. This is a directory structure created by MAKER to store contig specific data. There is a datastore directory (and a corresponding index file) in the make output directory. The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid. This contains all of the output of each program that MAKER runs. B On Feb 13, 2014, at 1:11 PM, Carson Holt wrote: > No. Snap genes do not disappear. All SNAP ab initio calls will always be > kept as reference fetters marked snap_masked (for repeat masked genome) > and snap (for unmasked genome). MAKER then runs SNAP another time where > it feeds hints to SNAP based on EST and protein alignment evidence. These > hint based models can then compete against the ab initio SNAP models to be > promoted to genes if their AED scores are better. Fianl models can also > get UTR added based on EST evidence. That is why you can get models from > MAKER that do not match the original SNAP ab initio calls. > > So in summary, all SNAP ab initio models will be in snap_masked. The > MAKER models will consist of hint based SNAP rerun plus SNAP ab intio > models processed to add UTR. > > Thanks, > Carson > > > > On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >> Dear all, >> >> I ran snap2 based gene prediction through maker. >> >> In the resulting gff file, in the source "snap_masked" I can find the >> score in the score column of every snap prediction that did not get >> promoted to a maker gene. This would be the score of how well the >> prediction matches the HMM? >> >> It seems to me that those snap models that are given gene status no >> longer appear as snap_masked source but only as source "maker". Maker >> then removes the score column, instead giving AED and eAED scores (which >> are more about how the model corresponds to the evidence). When viewing >> the maker transcripts and SNAP predictions in a browser, they do not >> match (mostly, maker predictions are longer). >> >> I am interested in the score of individual gene predictions that >> underlined maker gene models. Where could I find that information? >> >> Many thanks! >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mptrsen at uni-bonn.de Thu Feb 13 20:00:24 2014 From: mptrsen at uni-bonn.de (Malte Petersen) Date: Fri, 14 Feb 2014 04:00:24 +0100 Subject: [maker-devel] BLAST options error / should Maker check for file format? Message-ID: <52FD86C8.6040007@uni-bonn.de> Dear MAKER devs, I was running Maker version 2.30p-beta on an insect genome, and it didn't produce any output. I got these error messages: Widget::formater: /path/to/makeblastdb -dbtype nucl -in /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 #-------------------------------# BLAST options error: File /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 is empty ERROR: /path/to/makeblastdb failed in Widget::formater --> rank=NA, hostname=Jeanne-GBR ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:scf7180005143343 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf7180005143343 I figured out that this error is due to a non-Fasta file format being fed to Maker as extrinsic evidence (I gave it a meta-info file). While I got the pipeline running now with the correct file, I think that it should be complaining (a lot earlier) if any of the input files are of the wrong format. More people might run into this problem and have no idea where to look for a solution. What do you think? Best, Malte From carsonhh at gmail.com Thu Feb 13 20:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 20:11:22 -0700 Subject: [maker-devel] BLAST options error / should Maker check for file format? In-Reply-To: <52FD86C8.6040007@uni-bonn.de> References: <52FD86C8.6040007@uni-bonn.de> Message-ID: Hi Malte, Actually there already is. I?m very surprised your file made it that far. Normally it fails right away. Example ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1 appears to be empty. Another test file ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The nucleotide sequence file '/Users/cholt/Developer/maker/trunk/data/test2' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'M' You seem to have found just the right formula of improper input to get past the filters on your run :-) Thanks, Carson On 2/13/14, 8:00 PM, "Malte Petersen" wrote: >Dear MAKER devs, > >I was running Maker version 2.30p-beta on an insect genome, and it >didn't produce any output. I got these error messages: > > >Widget::formater: >/path/to/makeblastdb -dbtype nucl -in >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >#-------------------------------# >BLAST options error: File >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >is empty >ERROR: /path/to/makeblastdb failed in Widget::formater >--> rank=NA, hostname=Jeanne-GBR >ERROR: Failed while doing blastn of ESTs >ERROR: Chunk failed at level:0, tier_type:3 >FAILED CONTIG:scf7180005143343 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf7180005143343 > > >I figured out that this error is due to a non-Fasta file format being >fed to Maker as extrinsic evidence (I gave it a meta-info file). While >I got the pipeline running now with the correct file, I think that it >should be complaining (a lot earlier) if any of the input files are of >the wrong format. More people might run into this problem and have no >idea where to look for a solution. > >What do you think? > >Best, >Malte > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 14 12:09:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Feb 2014 19:09:08 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features. The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". I've attached a script that should do this for you. Run it with this command "perl fix_gff3_script.pl > " And then run MAKER with the fixed gff3 file in place of the old gff3 file. Let me know if that works, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Thursday, February 13, 2014 3:27 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I downloaded maker 2.31 and ran the same scaffold. Again it gave error on the gff file. I then removed the word mRNA-1 from my gff file and ran it again. It seems to have worked this time. Attached are std error files for first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA (that apparently worked). Since the gff file is as evidence only I thought it should not matter to remove the mRNA-1 naming form the gff file. Cheers HB On 14-02-12 12:59 PM, "Daniel Ence" wrote: >Hi Hossein, > >So, after looking at the gff3 and your control files, I had an idea. >There's the part of the control file called "Re-annotation Using MAKER >Derived GFF3", but you can also passthrough features from a gff3 using >the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. > >Sometimes we encounter problems with the MAKER passthrough. Could you try >dividing the gff3 file into the different feature sources and passing it >through the "est_gff" etc options and not with the MAKER passthrough? >That will tell us if the problem is with the gff3 file or with how MAKER >is processing it. > >Another also to check is to make sure that the contig names in the gff3 >file match the contig names in the fasta file that you're annotating. > >Thanks, >Daniel > > > >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 8:49 AM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > > >I have generated the files that you requested. I choose Sc00009 from my >genome which is 30 kb and was one of the scaffolds coming up with error. >In addition to Ctl files and error output file I also attached a part of >the gff file related to SC00009 that is indicated in the error message. > > >Thanks for helping with this > > > >Regards > > >HB > > > > > > > > > > > > >On 14-02-11 4:59 PM, "Daniel Ence" wrote: > >>Hi Hossen, >> >>I think that what would be the most help right now is if you ran MAKER on >>only one of those contigs that are failing and send me the entire error >>output along with the maker control files that you are using. It looks >>like the error is coming from the gff3 files that you are using as input. >> >>Thanks, >>Daniel >> >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 3:51 PM >>To: Daniel Ence >>Subject: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >>I re-started maker and it is still running. But in error our file that >>has >>been generated so far it seems that smaller conitgs are affected. There >>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >>length having this error >> >>I was wondering if I need to change the setting in the maker_opt file >> >>#-----MAKER Behavior Options >>max_dna_len=100000 #length for dividing up contigs into chunks >>(increases/decreases memory usage) >>min_contig=1 #skip genome contigs below this length (under 10kb are often >>useless) >> >> >>If I understand correctly max_dna_len divide conitgs of over 100kb to >>smaller chucks. However it is not clear to me that for the min_contig >>option if the default contig length is 10kb or less, then why I have >>error >>message for 30kb long contigs. Should I change this to 0 >> >>Here is an example of the error message for one of the contigs >> >> >>#--------- command -------------# >>Widget::exonerate::est2genome: >>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>-t >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113 >>6 >>. >>fasta >>-Q dna -T dna --model est2genome >>--minintron 20 --showcigar --percent 20 > >>/raid01/projects/Plasmodiophora/brassica >>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras >>s >>i >>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT >>3 >>S >>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>#-------------------------------# >>cleaning blastn... >>cleaning tblastx... >>cleaning blastx... >>ERROR: Failed on >>PbPT3Sc00001_S_0.8_1-mRNA-1 >>Check your input GFF3 file for errors! >>(from GFFDB) >> >>FATAL ERROR >>ERROR: Failed while processing the chunk >>divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00001 >> >> >> >> >>--Next Contig-- >> >> >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >> >>>Hossein, >>> >>>Ok. So since this error came up on a local install, I'm going to need >>>some more information to understand what went wrong. Is it the same >>>contig that always causes this error? If it is, then is the the only >>>error or warning that MAKER encounters while running on this contig? Or, >>>if multiple contigs fail, then is it always the same error? >>> >>>If you can narrow it down to the smallest possible dataset that >>>consistently gives the same error, then we canb egin to understand >>>what's >>>wrong. >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 11:20 AM >>>To: Daniel Ence >>>Subject: Re: [maker-devel] Falied to create new account >>> >>>Hi Daniel >>> >>>I running it through the local server at my work >>> >>> >>> >>> >>> >>> >>>M. Hossein Borhan, Ph.D. >>>Research Scientist/ Chercheur Scientifique >>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>Telephone/T?l?phone: (306) 385-9441 >>>Facsimile/T?l?copieur: (306) 385-9482 >>>Hossein.borhan at agr.gc.ca >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>> >>>>Hi Hossein, >>>> >>>>Did you encounter this error while you were running MAKER on your local >>>>machine or through the MAKER web annotation service? >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Carson Holt [carsonhh at gmail.com] >>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>To: Daniel Ence >>>>Cc: Mark Yandell >>>>Subject: FW: [maker-devel] Falied to create new account >>>> >>>>Hey Daniel could you download his dataset, and see if you can replicate >>>>the error. Also check if this was an MWAS job or a local maker run >>>>(his >>>>dataset will already be there for MWAS, you just need the job ID). >>>> >>>>Thanks, >>>>Carson >>>> >>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Hi Carson >>>>> >>>>> >>>>>I encountered this error while running maker >>>>> >>>>>FATAL ERROR >>>>>ERROR: Failed while processing the chunk divide!! >>>>> >>>>>ERROR: Chunk failed at level 17 >>>>>!! >>>>>FAILED CONTIG:PbPT3Sc00006 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fix_gff3_script.pl Type: application/octet-stream Size: 349 bytes Desc: fix_gff3_script.pl URL: From claudio.valero at wur.nl Mon Feb 17 02:23:21 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Mon, 17 Feb 2014 09:23:21 +0000 Subject: [maker-devel] Maker not predicting many genes Message-ID: Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 4776 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SOBA.pdf Type: application/pdf Size: 210262 bytes Desc: SOBA.pdf URL: From carson.holt at genetics.utah.edu Mon Feb 17 12:22:13 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Feb 2014 19:22:13 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 17 12:26:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Feb 2014 12:26:05 -0700 Subject: [maker-devel] Maker not predicting many genes Message-ID: >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 01:20:04 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 08:20:04 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 19 08:34:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Feb 2014 08:34:33 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You provided a directory rather than a file to the -d option (?d' stands for datastore log). You must provide the location of the datastore index log file and not the datastore directory. Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log Thanks, Carson From: "Valero Jimenez, Claudio" Date: Wednesday, February 19, 2014 at 1:20 AM To: Carson Holt , Carson Holt , "'maker-devel at yandell-lab.org'" Subject: RE: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 19 09:04:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 19 Feb 2014 16:04:08 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 09:33:36 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 16:33:36 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi, Thanks, I had a mistake in the command line!!! Regards, Claudio From: Daniel Ence [mailto:dence at genetics.utah.edu] Sent: woensdag 19 februari 2014 17:04 To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: RE: [maker-devel] Maker not predicting many genes Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I'd set correct_est_fusion=1 as well. -Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Wed Feb 19 11:03:47 2014 From: barry.utah at gmail.com (Barry Moore) Date: Wed, 19 Feb 2014 11:03:47 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: > Hi Claudio, > > What was the command line you used for gff3_merge? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] > Sent: Wednesday, February 19, 2014 1:20 AM > To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > Hi Carson, > > Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: > > Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. > > Similar thing happens when I try fasta_merge: > > Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. > > I never had this problem before with these commands. > > > Regards, > > Claudio > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: maandag 17 februari 2014 20:26 > To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. > > ?Carson > > > From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM > To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes > > You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. > > If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). > > Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. > > Thanks, > Carson > > > > > > > From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM > To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes > > Dear list, > > I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. > > Regards, > > Claudio > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 19 11:06:52 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 19 Feb 2014 18:06:52 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> References: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Message-ID: You only need to swap a single character in the script. Just change the -e (exists) test to a -f (is file) test. Thanks, Carson From: Barry Moore > Date: Wednesday, February 19, 2014 at 11:03 AM To: Daniel Ence > Cc: "Valero Jimenez, Claudio" >, Carson Holt >, Carson Holt >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gtaylor at bcgsc.ca Fri Feb 21 11:48:42 2014 From: gtaylor at bcgsc.ca (Greg Taylor) Date: Fri, 21 Feb 2014 10:48:42 -0800 Subject: [maker-devel] Maker jobs hanging Message-ID: Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor From dence at genetics.utah.edu Fri Feb 21 11:54:17 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 18:54:17 +0000 Subject: [maker-devel] Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166 Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca] Sent: Friday, February 21, 2014 11:48 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker jobs hanging Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Feb 21 11:56:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Feb 2014 11:56:50 -0700 Subject: [maker-devel] Maker jobs hanging Message-ID: Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 15:04:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 22:04:34 +0000 Subject: [maker-devel] FW: Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Friday, February 21, 2014 11:56 AM To: Greg Taylor; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker jobs hanging Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 19:38:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 02:38:59 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> Message-ID: Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Fri Feb 21 21:27:10 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 04:27:10 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>, , Message-ID: Hi Joe, MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. Will you attach those file to your reply, so we can make sure that the settings are set up correctly? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 7:44 PM To: Daniel Ence Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Daniel, Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. Thanks, Joe ________________________________________ From: Daniel Ence Sent: Friday, February 21, 2014 7:38 PM To: Joseph Said Cc: maker-devel at yandell-lab.org Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Sat Feb 22 15:51:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 22:51:48 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>, Message-ID: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Sat Feb 22 16:21:51 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 23:21:51 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> , Message-ID: Hi Hua, will you upload the genome file to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170 I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model. Will you also upload the MAKER output and datastore from the MAKER run? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 4:00 PM To: Daniel Ence Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" > wrote: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sun Feb 23 09:57:09 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 16:57:09 +0000 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se> Dear list and maker developers, I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300). As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this? Best regards, Mikael Durling From carsonhh at gmail.com Sun Feb 23 13:00:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 23 Feb 2014 13:00:50 -0700 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: The image doesn?t show all evidence sources, but the short answer is that one of you evidence sources (est2genome, protein2genome, or blastx) bridges the two regions, and when provided the bridged hint one of the gene predictors thinks it makes sense to create a single model instead. my guess is that it?s blastx evidence. ?Carson On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" wrote: >Dear list and maker developers, > >I was browsing the results of a recent maker run, focusing on differences >between this run with the a recent maker (svn r1067) and a previous run >with svn revision 1022 (I recall). One of the differences I found was a >gene lost in the new prediction set, but replaced by an extended version >of a previous neighbor (see >http://figshare.com/articles/Maker_prediction_comparison/942300). As you >can see, there is no support for the join in the evidence. Do you have >any clue to what might cause this? > >Best regards, >Mikael Durling > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Sun Feb 23 14:14:00 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 21:14:00 +0000 Subject: [maker-devel] Maker predicting fusion genes? In-Reply-To: References: Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se> Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted). Mikael 23 feb 2014 kl. 21:00 skrev Carson Holt : > The image doesn?t show all evidence sources, but the short answer is that > one of you evidence sources (est2genome, protein2genome, or blastx) > bridges the two regions, and when provided the bridged hint one of the > gene predictors thinks it makes sense to create a single model instead. > my guess is that it?s blastx evidence. > > ?Carson > > > On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Dear list and maker developers, >> >> I was browsing the results of a recent maker run, focusing on differences >> between this run with the a recent maker (svn r1067) and a previous run >> with svn revision 1022 (I recall). One of the differences I found was a >> gene lost in the new prediction set, but replaced by an extended version >> of a previous neighbor (see >> http://figshare.com/articles/Maker_prediction_comparison/942300). As you >> can see, there is no support for the join in the evidence. Do you have >> any clue to what might cause this? >> >> Best regards, >> Mikael Durling >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: region.gff3 Type: application/octet-stream Size: 19612 bytes Desc: region.gff3 URL: From hedgyx at yahoo.com Mon Feb 24 00:02:41 2014 From: hedgyx at yahoo.com (Megan) Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Maker folks, I am re-annotating a single contig and I am having a few problems. First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added). The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first. So it appears to be a systematic error across the entire file. The Maker error is "Check your input GFF3 file for errors! (from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff. Attached is a gff with the first 2 genes. Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV]. It suggests "set -fix_nucleotides on the command line to fix this automatically". Is the -fix_nucleotides a Maker flag? What exactly does it do? Does it remove the entire sequence or replace ambiguous bases with a randomly selected one? Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely. Also, just curious, has Maker never supported these characters but just never complained? I used this EST data set with Maker 2.09. I did note poor EST coverage, but thought it was an issue with the EST data itself. I appreciate any suggestions. Thanks, Megan -------------- next part -------------- A non-text attachment was scrubbed... Name: part_passthru.gff Type: application/octet-stream Size: 4363 bytes Desc: not available URL: From zh9118 at gmail.com Sat Feb 22 16:00:28 2014 From: zh9118 at gmail.com (Hua Zhong) Date: Sat, 22 Feb 2014 16:00:28 -0700 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> Message-ID: The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" wrote: > Hi, > > Will you send me the long file that you were trying to blast against? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* Hua Zhong [zh9118 at gmail.com] > *Sent:* Saturday, February 22, 2014 10:46 AM > *To:* Daniel Ence > *Cc:* Joe Song; Joseph Said > *Subject:* Re: I am a PhD candidate at NMSU and have a question about > maker2 > > hi all, > Attached are the three configuration files and two input files, which are > used to predict something between the genome and protein. For a simple > test, we used one short sequence about 60bp and its translated protein > sequence as inputs. But got nothing returned. What's more, we did test long > genome sequence as one input as well, but still got nothing. I am not sure > what's the reason cause this result. > Thanks a lot for help. > > Hua > > > > > On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said wrote: > >> Hi Daniel, >> >> I do not have the exact files with me right now, but my coauthors on the >> paper I am working on have been copied on this email. Hua can send you >> those files. Thank you for being very helpful especially on a Friday night. >> >> Thanks, >> Joe >> >> Sent from my iPad >> >> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" >> wrote: >> > >> > Hi Joe, >> > >> > MAKER runs blast from your local system (or your server where MAKER is >> installed), and it blasts evidence that the user supplies in the "est" and >> "protein" settings. The est and protein settings are set in the >> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file >> and the specific blast settings are in the "maker_bopts.ctl" file. >> > >> > Will you attach those file to your reply, so we can make sure that the >> settings are set up correctly? >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 7:44 PM >> > To: Daniel Ence >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Daniel, >> > >> > Thank you for getting back to me so quickly. I am using the cotton >> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the >> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I >> believe maker2 just calls BLAST from NCBI's page. So when I search the >> cotton genome it returns zero hits. But then I used a known cotton gene as >> a test and ran a search and also returned zero hits. I am not sure what the >> problem is but it seems like the protocol that should be returning the >> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. >> I can a BLAST standalone and came up with hits for both my gene of interest >> and the control test gene and came up with results. >> > >> > Thanks, >> > Joe >> > ________________________________________ >> > From: Daniel Ence >> > Sent: Friday, February 21, 2014 7:38 PM >> > To: Joseph Said >> > Cc: maker-devel at yandell-lab.org >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Joe, >> > >> > Will you upload your control files and data at this URL? >> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 >> > >> > Also, what version of MAKER and blast are you using? And which file are >> you using for the known arabidopsis gene? >> > >> > I've copied this email to the maker-development list, which is a really >> good resource for trouble-shooting MAKER issues. >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Mark Yandell >> > Sent: Friday, February 21, 2014 7:32 PM >> > To: Daniel Ence >> > Subject: FW: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Mark Yandell >> > Professor of Human Genetics >> > H.A. & Edna Benning Presidential Endowed Chair >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ph:801-587-7707 >> > >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 5:18 PM >> > To: Mark Yandell >> > Subject: I am a PhD candidate at NMSU and have a question about maker2 >> > >> > Dear Dr. Yandell, >> > >> > I am a molecular biologist at NMSU. I am trying to use maker2 with the >> cotton genome, and search an Arabidopsis gene against it. I think there is >> a problem with the blast component because zero results are returned. I >> tried troubleshooting by searching a known gene and still returned zero >> results. Is this a common problem maybe with the pipeline? I would >> appreciate any ideas you might have to help me. >> > >> > Thank you, >> > Joe >> > >> > Sent from my iPad >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 24 11:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:18:18 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Feb 24 11:31:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 24 Feb 2014 18:31:47 +0000 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>, Message-ID: Hi Megan, One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. You can fix it with this command-line perl: cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3 It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, February 24, 2014 11:18 AM To: Megan; maker-devel at yandell-lab.org Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 11:34:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:34:28 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: Actually that is not true. CDS IDs can be the same or different. MAKER doesn?t care either way. Both are valid in GFF3. Having the same ID just allows then to be put together by some GMOD viewers without having to go through a container feature. ?Carson On 2/24/14, 11:31 AM, "Daniel Ence" wrote: >Hi Megan, > >One problem with the GFF3 that you attached is that the ID's for the CDS >features are being made wrong. All of the CDS features for a given mRNA >or transcript should have the same ID. The CDS features in your GFF3 have >IDs that use the exon name. > >You can fix it with this command-line perl: >cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; >/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print >"$_\n"}else{print $_}' > fixed.gff3 > >It just fixes the ID attributes in all of the CDS features. Try it on the >test gff3 you sent and let me know if it works. I can't test it myself >without the fasta file that you are annotating. > >Thanks, >Daniel > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Carson Holt [carsonhh at gmail.com] >Sent: Monday, February 24, 2014 11:18 AM >To: Megan; maker-devel at yandell-lab.org >Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > >The -fix_nucleotides flag is added to the command line (I.e. maker >-fix_nucleotides flag). It is there so you are aware that there is an >issue with your fasta file, that will cause things downstream to fail. >MAKER can fix the errors for you, but first it gives a warning designed to >make you look at the file and validate it. Why would you want to do this? > For example, what if you provided protein sequence to the EST option >accidentally, you wouldn?t want MAKER to just proceed. You want a warning >so you can check first. If your file is in fact EST data, then set the >flag and those characters will be changed to N?s in the fixed fasta >sequence, otherwise those characters will cause errors in downstream tools >like exonerate, and even some downstream GMOD tools, so they can?t be >allowed to remain as is. > >For the GFF3 file, there is almost definitely a logic issue in the file >(mod encode validator won?t check for those). This can be from prior >manipulation of the GFF3 file. For example, IDs for a gene that are the >same across two contigs (technically valid but a logic error). The GFF3 >error message will normally give the ID of the feature causing the issue. > >I could also take a look for you. You can upload the GFF3 file here ?> >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >Click on 'new guest account' then e-mail me back you guest ID, so I know >which files to review. > >Thanks, >Carson > > > >On 2/24/14, 12:02 AM, "Megan" wrote: > >>Maker folks, >>I am re-annotating a single contig and I am having a few problems. >> >>First, I am having trouble passing through a Maker derived gff (from >>Maker 2.09, with some modifications to gene names and functional >>information added). The gff file passes the modencode validator but >>Maker always fails on the first gene in the file, regardless of which >>gene comes first. So it appears to be a systematic error across the >>entire file. The Maker error is "Check your input GFF3 file for errors! >>(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >>with model_pass=1 and pred_gff. Attached is a gff with the first 2 >>genes. >> >>Second, when I updated to Maker 2.31, Maker now complains that my EST >>fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >>suggests "set -fix_nucleotides on the command line to fix this >>automatically". Is the -fix_nucleotides a Maker flag? What exactly does >>it do? Does it remove the entire sequence or replace ambiguous bases >>with a randomly selected one? Half of my 20k ESTs contain these >>characters, so I don't want to throw them out entirely. >> >>Also, just curious, has Maker never supported these characters but just >>never complained? I used this EST data set with Maker 2.09. I did note >>poor EST coverage, but thought it was an issue with the EST data itself. >> >>I appreciate any suggestions. >>Thanks, >>Megan_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 13:59:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 13:59:12 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: I found the issue. You have non-ascii characters at the end of almost every line. Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works. Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions. I have looked at the gff file, but I do not >see any obvious errors. I have uploaded the files to your website. The >reference fasta is there, the full gff, and a single gene gff that also >causes an error. If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff. The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information. I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag). It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it. Why would > you want to do this? > For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed. You want a warning > so you can check first. If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those). This > can be from prior > manipulation of the GFF3 file. For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error). The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you. You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added). The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first. So it appears to be a systematic > error across the > >entire file. The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)". I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff. Attached is a gff > with the first 2 > >genes. > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV]. It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically". Is the -fix_nucleotides a Maker > flag? What exactly does > >it do? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained? I used this EST data set with > Maker 2.09. I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: purify Type: application/octet-stream Size: 1966 bytes Desc: not available URL: From carsonhh at gmail.com Mon Feb 24 14:03:00 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 14:03:00 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: One more thing. You must give the file to pred_gff or model_gff. It is no longer strictly a MAKER file, as many of the source columns read ?.? meaning it has been edited by Apollo or another editor. So it will not be guaranteed to be recognized by genome_gff, because many of the source tags have changed. Thanks, Carson On 2/24/14, 1:59 PM, "Carson Holt" wrote: >I found the issue. You have non-ascii characters at the end of almost >every line. Because they are happening within the Parent= tag, they then >become part of the Parent ID when the file is read. > >So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent >ID. > >?\cM? is a meta-return. > >I ran the attached script to remove these characters (perl purify >), and then it works. Make sure to remove the >.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file >to force the GFF3 database to be rebuilt after fixing the file when you >rerun MAKER. > >Thanks, >Carson > > > > >On 2/24/14, 1:32 PM, "Megan" wrote: > >>Hi Carson and Daniel, >> >>Thanks for your suggestions. I have looked at the gff file, but I do not >>see any obvious errors. I have uploaded the files to your website. The >>reference fasta is there, the full gff, and a single gene gff that also >>causes an error. If I remove that gene from the full gff, then the error >>is on the next gene in the file, so it appears to be a systematic problem >>throughout the gff. The gff was generated by Maker, but I may have >>messed it up when I modified it to rename genes and add functional >>information. I checked with cat -te, but don't see any obvious >>formatting errors. >> >>Thanks! >>Megan >> >> >>-------------------------------------------- >>On Mon, 2/24/14, Carson Holt wrote: >> >> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >>nucleotides >> To: "Megan" , maker-devel at yandell-lab.org >> Date: Monday, February 24, 2014, 10:18 AM >> >> The -fix_nucleotides flag is added to >> the command line (I.e. maker >> -fix_nucleotides flag). It is there so you are aware >> that there is an >> issue with your fasta file, that will cause things >> downstream to fail. >> MAKER can fix the errors for you, but first it gives a >> warning designed to >> make you look at the file and validate it. Why would >> you want to do this? >> For example, what if you provided protein sequence to the >> EST option >> accidentally, you wouldn?t want MAKER to just >> proceed. You want a warning >> so you can check first. If your file is in fact EST >> data, then set the >> flag and those characters will be changed to N?s in the >> fixed fasta >> sequence, otherwise those characters will cause errors in >> downstream tools >> like exonerate, and even some downstream GMOD tools, so they >> can?t be >> allowed to remain as is. >> >> For the GFF3 file, there is almost definitely a logic issue >> in the file >> (mod encode validator won?t check for those). This >> can be from prior >> manipulation of the GFF3 file. For example, IDs for a >> gene that are the >> same across two contigs (technically valid but a logic >> error). The GFF3 >> error message will normally give the ID of the feature >> causing the issue. >> >> I could also take a look for you. You can upload the >> GFF3 file here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> Click on 'new guest account' then e-mail me back you guest >> ID, so I know >> which files to review. >> >> Thanks, >> Carson >> >> >> >> On 2/24/14, 12:02 AM, "Megan" >> wrote: >> >> >Maker folks, >> >I am re-annotating a single contig and I am having a few >> problems. >> > >> >First, I am having trouble passing through a Maker >> derived gff (from >> >Maker 2.09, with some modifications to gene names and >> functional >> >information added). The gff file passes the >> modencode validator but >> >Maker always fails on the first gene in the file, >> regardless of which >> >gene comes first. So it appears to be a systematic >> error across the >> >entire file. The Maker error is "Check your input >> GFF3 file for errors! >> >(from GFFDB)". I have tried Maker 2.10 >> and 2.31, using both genome_gff >> >with model_pass=1 and pred_gff. Attached is a gff >> with the first 2 >> >genes. >> > >> >Second, when I updated to Maker 2.31, Maker now >> complains that my EST >> >fasta file has nucleotides that are not supported >> [RYKMSWBDHV]. It >> >suggests "set -fix_nucleotides on the command line to >> fix this >> >automatically". Is the -fix_nucleotides a Maker >> flag? What exactly does >> >it do? Does it remove the entire sequence or >> replace ambiguous bases >> >with a randomly selected one? Half of my 20k ESTs >> contain these >> >characters, so I don't want to throw them out entirely. >> > >> >Also, just curious, has Maker never supported these >> characters but just >> >never complained? I used this EST data set with >> Maker 2.09. I did note >> >poor EST coverage, but thought it was an issue with the >> EST data itself. >> > >> >I appreciate any suggestions. >> >Thanks, >> >Megan_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > From rbharris at uw.edu Tue Feb 25 14:49:57 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Tue, 25 Feb 2014 13:49:57 -0800 Subject: [maker-devel] error in snap training Message-ID: Hey - I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 15:12:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 15:12:14 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Make sure you are using 2.31, and then try the maker2zff filters individually. If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP. Use cegma2zff for the conversion. --Carson Sent from my iPhone > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > Hey - > > I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. > > Thanks! > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Feb 25 17:06:03 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 25 Feb 2014 16:06:03 -0800 Subject: [maker-devel] Mapping gene names Message-ID: Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the *map_forward* option, which applies to the *model_gff* parameter. Is there a similar option for *est* and *protein*? *maker_opts.ctl* est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From hedgyx at yahoo.com Tue Feb 25 17:26:11 2014 From: hedgyx at yahoo.com (Megan) Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com> Carson, Everything ran through smoothly after removing the ^Ms. Thanks for the help. Megan -------------------------------------------- On Mon, 2/24/14, Carson Holt wrote: Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides To: "Megan" , "Daniel Ence" Cc: "maker-devel at yandell-lab.org" Date: Monday, February 24, 2014, 12:59 PM I found the issue.? You have non-ascii characters at the end of almost every line.? Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works.? Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions.? I have looked at the gff file, but I do not >see any obvious errors.? I have uploaded the files to your website.? The >reference fasta is there, the full gff, and a single gene gff that also >causes an error.? If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff.? The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information.? I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag).? It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it.? Why would > you want to do this? >? For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed.? You want a warning > so you can check first.? If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those).? This > can be from prior > manipulation of the GFF3 file.? For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error).? The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you.? You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added).? The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first.? So it appears to be a systematic > error across the > >entire file.? The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)".???I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff.? Attached is a gff > with the first 2 > >genes.? > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV].? It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically".? Is the -fix_nucleotides a Maker > flag?? What exactly does > >it do?? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one?? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained?? I used this EST data set with > Maker 2.09.? I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Feb 25 17:58:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 17:58:08 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 18:04:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 18:04:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: One more note. When using this option, the score column of mRNA features will represent how completely this gene matches the source EST/protein (fraction coverage multiplied by % identity). So a value of 100 means there is perfect match. This way if the same transcript maps to multiple locations, then you can identify which locations is the closest match (also works for identifying likly orthologs vs. paralogs). ?Carson From: Carson Holt Date: Tuesday, February 25, 2014 at 5:58 PM To: Shaun Jackman , Subject: Re: [maker-devel] Mapping gene names There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 18:36:21 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 17:36:21 -0800 Subject: [maker-devel] invalid gff3 format issues Message-ID: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 25 19:02:04 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 26 Feb 2014 02:02:04 +0000 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Walter, Will you upload the full GFF3 and the control files that you used to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 Also, what version of MAKER are you running this with? Thanks, Daniel On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 19:11:12 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 18:11:12 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Daniel, those have been uploaded and I'm using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to > this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > > Hi all, > > I am trying to update maker annotations with PASA and encountered errors > stemming from file format issues in the gff3 file. > > I put a few lines from the gff3 to highlight the issue below. Basically, > the problem is that there are non-unique IDs for a number of the > annotations. > > Is there anything that can be done to right this problem? > > Thanks, > > Walter > > Lines from GFF3 file, repeated IDs are highlighted: > > > chr1 maker gene 9377440 9432028 . - . > ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 > chr1 maker mRNA 9377440 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 > chr1 maker exon 9431899 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > chr1 maker exon 9431698 9431808 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > > chr1 maker gene 8894975 9021577 . + . > ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 > chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 > chr1 maker exon 8894975 8895153 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > chr1 maker exon 8942215 8942531 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 21:10:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 21:10:27 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you try version 2.31 (the current version)? I believe this is happening because you are passing in MAKER genes as pred_gff the transcripts thus ended up with the same Names and IDs as the genes being generated by the MAKER run via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in 2.31 (IDs and names are generated slightly differently in 2.30+). Thanks, Carson From: Walter Eckalbar Date: Tuesday, February 25, 2014 at 7:11 PM To: Daniel Ence Cc: "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Daniel, those have been uploaded and I?m using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, the >> problem is that there are non-unique IDs for a number of the annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4. >> 16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1 >> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82 >> |1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak >> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0 >> .88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Feb 26 01:26:35 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 26 Feb 2014 08:26:35 +0000 Subject: [maker-devel] Functional annotation options Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? 2) maker_functional_gff This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 3) maker_functional This just throws an error about a missing Job ID, so no clue what this would be used for. I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. With kind regards, Marc Hoeppner Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From mikael.durling at slu.se Wed Feb 26 02:43:43 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:43:43 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> 26 feb 2014 kl. 09:26 skrev Marc H?ppner : > Dear List, > > I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: > > 1) iprscan > It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. > 2) maker_functional_gff > This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. It works fine with ncbiblast+ and the blastp command with -outfmt 6. cheers, Mikael Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations. > > 3) maker_functional > This just throws an error about a missing Job ID, so no clue what this would be used for. > > I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. > > With kind regards, > > Marc Hoeppner > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Wed Feb 26 02:55:56 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:55:56 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se> 26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling >: 26 feb 2014 kl. 09:26 skrev Marc H?ppner >: Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however. Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 05:30:44 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 12:30:44 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 06:22:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 06:22:34 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: > > Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >> >> maker_opts.ctl >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> Thanks, >> Shaun >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 06:37:29 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 13:37:29 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nextgen.usfs at gmail.com Wed Feb 26 09:21:33 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:21:33 -0600 Subject: [maker-devel] change program locations in maker_exe Message-ID: Hello, I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information. Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file? Thank you. - Jon From carsonhh at gmail.com Wed Feb 26 08:38:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 08:38:47 -0700 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: maker_functional is a script that gets called by another script, not meant to be called directly by the user. So ignore that. Just run iprscan directly it already works pretty well. The mpi_iprscan and iprscan_wrap scripts, just give some logging functionality by wrapping the iprscan call. In most cases there is not advantage over just running iprscan directly. ?Carson On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" wrote: > >26 feb 2014 kl. 09:26 skrev Marc H?ppner : > >> Dear List, >> >> I have finished a gene build now, and I would like to go over to >>functional annotation. I understand that maker includes a few script to >>facilitate such analyses. However, I have a few questions about this: >> >> 1) iprscan >> It seems maker includes a MPI wrapper for InterProscan, but requests >>?iprscan? to be in $PATH. The latest versions of Interproscan I have >>worked with are java applications and eventho I put their location in >>$PATH, mpi_iprscan seems to want something else? But what? > >I don?t believe it works with interproscan5. What I usually do is to >split the maker protein file into chunks, and then run these chunks as >separate jobs on our cluster, then finally merge the results. The TSV >file form iprscan5 can be input into the maker tool ipr_update_gff. I >have not tried the iprscan2gff3, as I haven?t figured how to get an >iprscan4 raw file from iprscan5. > > >> 2) maker_functional_gff >> This script seems to be very useful, but the description suggests that >>it requires WuBlast tabular output ?2', which I think looks quite >>different from the ncbi blast tabular output. Since Wublast is not >>really available anymore (except this very old, frozen binary bundle), I >>was wondering how to address this issue. > >It works fine with ncbiblast+ and the blastp command with -outfmt 6. > >cheers, >Mikael > >Ps. Your welcome to visit me at SLU if you would like to discuss >experiences of genome annotations. > > >> >> 3) maker_functional >> This just throws an error about a missing Job ID, so no clue what this >>would be used for. >> >> I guess what I am after is some suggestion as to how use the scripts >>included with Maker to achieve a reasonable functional annotation. >> >> With kind regards, >> >> Marc Hoeppner >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 09:09:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 09:09:14 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: > >> Can this use of maker_coor be used only to hint about the placement of the >> ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, can >> this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >>> There is a way. It?s not a standard option and it?s undocumented, but if >>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that. >>> The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add tags >>> to your fasta headers, those can be used to guide the mapping and naming. >>> For example, gene_id= will ensure different isoforms that share >>> a common gene_id get clustered into the same gene, and >>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence >>> to only be mapped against chr1 within the range of 1-10000 bp and just >>> using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using blast >>> alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, using >>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate >>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is >>> it possible to map the names of the genes from the related species to my >>> annotation? I see the map_forward option, which applies to the model_gff >>> parameter. Is there a similar option for est and protein? >>> >>> maker_opts.ctl >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 26 09:38:37 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 26 Feb 2014 16:38:37 +0000 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: MAKER first looks inside of .../maker/exe/ for any executables. Then it uses the systems ?which? command to identify executables in your PATH environmental variable. If MAKER is not finding the one you want, then you can either put the program in the .../maker/exe/ folder (I.e. create .../maker/exe/bin/ and then put soft links to the executables you want to be used first), or you can rearrange the order of paraameters in your PATH environmental variable so that ?which ? returns the location you want. If MAKER is always leaving the locations to those programs empty, it is because you need to add them to your PATH environmental variable. Thanks, Carson On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: >Hello, >I was wondering if there is a way to make permanent changes to the >maker_exe.ctl file, as it seems on the install that maker didn?t find the >gene mark or pro build locations correctly, which means that I have to >manually edit the maker_exe.ctl file every time and add that information. > Where can I modify this permanently so that the maker -CTL command >creates the appropriate maker_exe file? Thank you. > >- Jon > > From nextgen.usfs at gmail.com Wed Feb 26 09:58:11 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:58:11 -0600 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com> Hi Carson, Thank you - that did it, I didn?t have them in the PATH. All working now. Cheers, Jon On Feb 26, 2014, at 10:38 AM, Carson Holt wrote: > MAKER first looks inside of .../maker/exe/ for any executables. Then it > uses the systems ?which? command to identify executables in your PATH > environmental variable. If MAKER is not finding the one you want, then > you can either put the program in the .../maker/exe/ folder (I.e. create > .../maker/exe/bin/ and then put soft links to the executables you want to > be used first), or you can rearrange the order of paraameters in your PATH > environmental variable so that ?which ? returns the location > you want. If MAKER is always leaving the locations to those programs > empty, it is because you need to add them to your PATH environmental > variable. > > Thanks, > Carson > > On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: > >> Hello, >> I was wondering if there is a way to make permanent changes to the >> maker_exe.ctl file, as it seems on the install that maker didn?t find the >> gene mark or pro build locations correctly, which means that I have to >> manually edit the maker_exe.ctl file every time and add that information. >> Where can I modify this permanently so that the maker -CTL command >> creates the appropriate maker_exe file? Thank you. >> >> - Jon >> >> > From weckalba at asu.edu Wed Feb 26 13:05:05 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Wed, 26 Feb 2014 12:05:05 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA's gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is > happening because you are passing in MAKER genes as pred_gff the > transcripts thus ended up with the same Names and IDs as the genes being > generated by the MAKER run via SNAP etc. This shouldn't happen with > model_gff, and shouldn't happen in 2.31 (IDs and names are generated > slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I'm using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: > >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, >> the problem is that there are non-unique IDs for a number of the >> annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 14:12:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 14:12:23 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you put the file in this GFF3 validator to see if anything comes up? ?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Maybe it?s just PASA. But I?d like to know there?s no issue being caused by something else. Thanks, Carson From: Walter Eckalbar Date: Wednesday, February 26, 2014 at 1:05 PM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA?s gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is happening > because you are passing in MAKER genes as pred_gff the transcripts thus ended > up with the same Names and IDs as the genes being generated by the MAKER run > via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in > 2.31 (IDs and names are generated slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I?m using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >>> Hi all, >>> >>> I am trying to update maker annotations with PASA and encountered errors >>> stemming from file format issues in the gff3 file. >>> >>> I put a few lines from the gff3 to highlight the issue below. Basically, >>> the problem is that there are non-unique IDs for a number of the >>> annotations. >>> >>> Is there anything that can be done to right this problem? >>> >>> Thanks, >>> >>> Walter >>> >>> Lines from GFF3 file, repeated IDs are highlighted: >>> >>> >>> chr1 maker gene 9377440 9432028 . - . >>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4 >>> .16 >>> chr1 maker mRNA 9377440 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4. >>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0. >>> 82|1|1|1|28|1680|1234 >>> chr1 maker exon 9431899 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> chr1 maker exon 9431698 9431808 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> >>> chr1 maker gene 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >>> chr1 maker mRNA 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma >>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84 >>> |0.88|27|503|2007 >>> chr1 maker exon 8894975 8895153 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> chr1 maker exon 8942215 8942531 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 15:04:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 22:04:37 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt >: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 15:50:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 15:50:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : > It will still work without est_forward. It just works a little differently. > Keep in mind this was a hidden feature I used to find stubborn or hard to find > missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and then > polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for > a match using all of chr1 as the input to exonerate even when BLAST finds > nothing (this is a very very slow search, but can help pick up one or two > stubborn genes that don?t remap well). To allow this, MAKER gives exonerate > looser matching parameters (i.e. allows for single base pair introns perhaps > caused by assembly errors). The logic here is that given the fact that I > already told MAKER that with some degree of confidence I expect sequence A to > map to to location X, it will try its hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at line > 1563, but only after a BLAST alignment has already seeded it to the region > (that BLAST result has the information in its description parameter). MAKER > will then ignore seeds completely outside of maker_coor. In addition any BLAST > seeds that overlap maker_coor will get the search space for alignment > polishing adjusted to match maker_coor exactly. Also match parameters for > exonerate will not be relaxed as they were with est_forward. > > As you can see the behavior, is slightly different (because it?s an accidental > feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading the > code, it seems that I need to supply maker_coor but not gene_id, as well as > the configuration option est_forward for this to work. Any occurrences of > maker_coor in GI.pm seems to be conditioned on set_forward=1 right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >> wrote: >> >>> Can this use of maker_coor be used only to hint about the placement of the >>> ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>>> There is a way. It?s not a standard option and it?s undocumented, but if >>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add tags >>>> to your fasta headers, those can be used to guide the mapping and naming. >>>> For example, gene_id= will ensure different isoforms that share >>>> a common gene_id get clustered into the same gene, and >>>> maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using blast >>>> alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, using >>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the map_forward option, which applies to the >>>> model_gff parameter. Is there a similar option for est and protein? >>>> >>>> maker_opts.ctl >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 16:45:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 16:45:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson Sent from my iPhone > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. > > THanks, > Mikael > >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >> >> As you can see the behavior, is slightly different (because it?s an accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >> >> Mikael >> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>> >>>>> maker_opts.ctl >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> Thanks, >>>>> Shaun >>>>> >>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 09:46:44 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 11:46:44 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- STATUS: Parsing control files... -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-g20-7.deepthought.umd.edu (PID 28015) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [compute-g20-8:09542] *** Process received signal *** [compute-g20-8:09542] Signal: Segmentation fault (11) [compute-g20-8:09542] Signal code: Address not mapped (1) [compute-g20-8:09542] Failing at address: 0xee00350 [compute-g20-8:09543] *** Process received signal *** [compute-g20-8:09543] Signal: Segmentation fault (11) [compute-g20-8:09543] Signal code: Address not mapped (1) [compute-g20-8:09543] Failing at address: 0xf020c90 [compute-g20-8:09544] *** Process received signal *** [compute-g20-8:09544] Signal: Segmentation fault (11) [compute-g20-8:09544] Signal code: Address not mapped (1) [compute-g20-8:09544] Failing at address: 0x1ad68f10 [compute-g20-8:09545] *** Process received signal *** [compute-g20-8:09545] Signal: Segmentation fault (11) [compute-g20-8:09545] Signal code: Address not mapped (1) [compute-g20-8:09545] Failing at address: 0x84a3188 [compute-g20-8:09545] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0] [compute-g20-8:09545] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0x530) [0x2b98f9ea4ec0] [compute-g20-8:09545] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b98f9ea60ca] [compute-g20-8:09545] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602] [compute-g20-8:09545] [ 4] perl(Perl_savepvn+0x26) [0x4816b6] [compute-g20-8:09545] [ 5] perl(Perl_do_exec3+0x31e) [0x4f715e] [compute-g20-8:09545] [ 6] perl(Perl_my_popen+0x403) [0x484d63] [compute-g20-8:09545] [ 7] perl(Perl_do_openn+0x1696) [0x4f9536] [compute-g20-8:09545] [ 8] perl(Perl_pp_open+0x184) [0x4efc44] [compute-g20-8:09545] [ 9] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09545] [10] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09545] [11] perl(main+0x135) [0x41b485] [compute-g20-8:09545] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4] [compute-g20-8:09545] [13] perl [0x41b299] [compute-g20-8:09545] *** End of error message *** [compute-g20-8:09546] *** Process received signal *** [compute-g20-8:09546] Signal: Segmentation fault (11) [compute-g20-8:09546] Signal code: Address not mapped (1) [compute-g20-8:09546] Failing at address: 0x8240850 [compute-g20-8:09547] *** Process received signal *** [compute-g20-8:09547] Signal: Segmentation fault (11) [compute-g20-8:09547] Signal code: Address not mapped (1) [compute-g20-8:09547] Failing at address: 0xd5c8850 [compute-g20-8:09548] *** Process received signal *** [compute-g20-8:09548] Signal: Segmentation fault (11) [compute-g20-8:09548] Signal code: Address not mapped (1) [compute-g20-8:09548] Failing at address: 0x8c80850 [compute-g20-8:09549] *** Process received signal *** [compute-g20-8:09549] Signal: Segmentation fault (11) [compute-g20-8:09549] Signal code: Address not mapped (1) [compute-g20-8:09549] Failing at address: 0x18d72850 [compute-g20-10:07087] *** Process received signal *** [compute-g20-10:07087] Signal: Segmentation fault (11) [compute-g20-10:07087] Signal code: Address not mapped (1) [compute-g20-10:07087] Failing at address: 0x6659f10 [compute-g20-10:07088] *** Process received signal *** [compute-g20-10:07088] Signal: Segmentation fault (11) [compute-g20-10:07088] Signal code: Address not mapped (1) [compute-g20-10:07088] Failing at address: 0x1fe3b5d0 [compute-g20-10:07089] *** Process received signal *** [compute-g20-10:07089] Signal: Segmentation fault (11) [compute-g20-10:07089] Signal code: Address not mapped (1) [compute-g20-10:07089] Failing at address: 0x9870350 [compute-g20-10:07090] *** Process received signal *** [compute-g20-10:07090] Signal: Segmentation fault (11) [compute-g20-10:07090] Signal code: Address not mapped (1) [compute-g20-10:07090] Failing at address: 0x17bad350 STATUS: Processing and indexing input FASTA files... [compute-g20-8:09567] *** Process received signal *** [compute-g20-8:09567] Signal: Segmentation fault (11) [compute-g20-8:09567] Signal code: Address not mapped (1) [compute-g20-8:09567] Failing at address: 0x1ad5aa10 [compute-g20-8:09567] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0] [compute-g20-8:09567] [ 1] /lib64/libc.so.6(strlen+0x30) [0x2b6de3f67f40] [compute-g20-8:09567] [ 2] perl(Perl_do_exec3+0x3a) [0x4f6e7a] [compute-g20-8:09567] [ 3] perl(Perl_my_popen+0x403) [0x484d63] [compute-g20-8:09567] [ 4] perl(Perl_do_openn+0x1696) [0x4f9536] [compute-g20-8:09567] [ 5] perl(Perl_pp_open+0x184) [0x4efc44] [compute-g20-8:09567] [ 6] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09567] [ 7] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09567] [ 8] perl(main+0x135) [0x41b485] [compute-g20-8:09567] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4] [compute-g20-8:09567] [10] perl [0x41b299] [compute-g20-8:09567] *** End of error message *** [compute-g20-7:28123] *** Process received signal *** [compute-g20-7:28123] Signal: Segmentation fault (11) [compute-g20-7:28123] Signal code: Address not mapped (1) [compute-g20-7:28123] Failing at address: 0x19ad9f10 STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore To access files for individual sequences use the datastore index: /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log [compute-g20-10:07107] *** Process received signal *** [compute-g20-10:07107] Signal: Segmentation fault (11) [compute-g20-10:07107] Signal code: Address not mapped (1) [compute-g20-10:07107] Failing at address: 0x9870362 [compute-g20-10:07107] [ 0] /lib64/libpthread.so.0 [0x2b50c5c8cca0] [compute-g20-10:07107] [ 1] perl [0x487218] [compute-g20-10:07107] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-10:07107] [ 3] perl [0x49d9dc] [compute-g20-10:07107] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-10:07107] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07107] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-10:07107] [ 7] perl(main+0x135) [0x41b485] [compute-g20-10:07107] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b50c5eb79c4] [compute-g20-10:07107] [ 9] perl [0x41b299] [compute-g20-10:07107] *** End of error message *** examining contents of the fasta file and run log examining contents of the fasta file and run log [compute-g20-10:07108] *** Process received signal *** [compute-g20-10:07108] Signal: Segmentation fault (11) [compute-g20-10:07108] Signal code: Address not mapped (1) [compute-g20-10:07108] Failing at address: 0x1fe3b5c8 examining contents of the fasta file and run log [compute-g20-10:07108] [ 0] /lib64/libpthread.so.0 [0x2b88f6f8dca0] [compute-g20-10:07108] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b88f61d55b2] [compute-g20-10:07108] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b88f7210ad1] [compute-g20-10:07108] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919] [compute-g20-10:07108] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19] [compute-g20-10:07108] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07108] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-10:07108] [ 7] perl(main+0x135) [0x41b485] [compute-g20-10:07108] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b88f71b89c4] [compute-g20-10:07108] [ 9] perl [0x41b299] [compute-g20-10:07108] *** End of error message *** examining contents of the fasta file and run log [compute-g20-10:07109] *** Process received signal *** [compute-g20-10:07109] Signal: Segmentation fault (11) [compute-g20-10:07109] Signal code: Address not mapped (1) [compute-g20-10:07109] Failing at address: 0x6664ad0 [compute-g20-10:07109] [ 0] /lib64/libpthread.so.0 [0x2b0809664ca0] [compute-g20-10:07109] [ 1] /lib64/libc.so.6 [0x2b08098edada] [compute-g20-10:07109] [ 2] /lib64/libc.so.6(memmove+0x75) [0x2b08098ec095] [compute-g20-10:07109] [ 3] perl(Perl_sv_setpvn+0x7a) [0x4b775a] [compute-g20-10:07109] [ 4] perl(Perl_pp_concat+0xc9) [0x4a5739] [compute-g20-10:07109] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07109] [ 6] perl(Perl_call_sv+0x160) [0x4333a0] [compute-g20-10:07109] [ 7] perl(Perl_magic_methcall+0x182) [0x488c22] [compute-g20-10:07109] [ 8] perl(Perl_magic_setpack+0x52) [0x489292] [compute-g20-10:07109] [ 9] perl(Perl_mg_set+0x66) [0x48aca6] [compute-g20-10:07109] [10] perl(Perl_pp_sassign+0x19c) [0x4a5c8c] [compute-g20-10:07109] [11] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07109] [12] perl(perl_run+0x243) [0x4340f3] [compute-g20-10:07109] [13] perl(main+0x135) [0x41b485] [compute-g20-10:07109] [14] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b080988f9c4] [compute-g20-10:07109] [15] perl [0x41b299] [compute-g20-10:07109] *** End of error message *** examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log --Next Contig-- --Next Contig-- --Next Contig-- examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Processing run.log file... examining contents of the fasta file and run log Processing run.log file... Processing run.log file... --Next Contig-- --Next Contig-- --Next Contig-- --Next Contig-- --Next Contig-- Processing run.log file... Processing run.log file... --Next Contig-- --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_2 Length: 2857 #--------------------------------------------------------------------- Processing run.log file... MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb.out did not finish on the last run and must be erased Processing run.log file... setting up GFF3 output and fasta chunks Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_7 Length: 972 #--------------------------------------------------------------------- [compute-g20-8:09576] *** Process received signal *** [compute-g20-8:09576] Signal: Segmentation fault (11) [compute-g20-8:09576] Signal code: Address not mapped (1) [compute-g20-8:09576] Failing at address: 0x1ad68f08 examining contents of the fasta file and run log #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_3 Length: 2316 #--------------------------------------------------------------------- [compute-g20-8:09576] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0] [compute-g20-8:09576] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b6de2f295b2] [compute-g20-8:09576] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b6de3f64ad1] [compute-g20-8:09576] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919] [compute-g20-8:09576] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19] [compute-g20-8:09576] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09576] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09576] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09576] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4] [compute-g20-8:09576] [ 9] perl [0x41b299] [compute-g20-8:09576] *** End of error message *** #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_4 Length: 1230 #--------------------------------------------------------------------- examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log [compute-g20-8:09578] *** Process received signal *** [compute-g20-8:09578] Signal: Segmentation fault (11) [compute-g20-8:09578] Signal code: Address not mapped (1) [compute-g20-8:09578] Failing at address: 0xee0af18 [compute-g20-8:09578] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0] [compute-g20-8:09578] [ 1] perl(Perl_av_fetch+0x5b) [0x49cf8b] [compute-g20-8:09578] [ 2] perl(Perl_pp_aelem+0x26e) [0x49e48e] [compute-g20-8:09578] [ 3] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09578] [ 4] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09578] [ 5] perl(main+0x135) [0x41b485] [compute-g20-8:09578] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b03d08629c4] [compute-g20-8:09578] [ 7] perl [0x41b299] [compute-g20-8:09578] *** End of error message *** setting up GFF3 output and fasta chunks Processing run.log file... [compute-g20-8:09583] *** Process received signal *** [compute-g20-8:09583] Signal: Segmentation fault (11) [compute-g20-8:09583] Signal code: Address not mapped (1) [compute-g20-8:09583] Failing at address: 0x822b0e2 [compute-g20-8:09582] *** Process received signal *** [compute-g20-8:09582] Signal: Segmentation fault (11) [compute-g20-8:09582] Signal code: Address not mapped (1) [compute-g20-8:09582] Failing at address: 0x8c6b0e2 [compute-g20-8:09583] [ 0] /lib64/libpthread.so.0 [0x2ab7f114dca0] [compute-g20-8:09583] [ 1] perl [0x487218] [compute-g20-8:09583] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-8:09583] [ 3] perl [0x49d9dc] [compute-g20-8:09583] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-8:09583] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09583] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09583] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09583] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ab7f13789c4] [compute-g20-8:09583] [ 9] perl [0x41b299] [compute-g20-8:09583] *** End of error message *** [compute-g20-8:09582] [ 0] /lib64/libpthread.so.0 [0x2b4eace23ca0] [compute-g20-8:09582] [ 1] perl [0x487218] [compute-g20-8:09582] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-8:09582] [ 3] perl [0x49d9dc] [compute-g20-8:09582] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-8:09582] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09582] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09582] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09582] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b4ead04e9c4] [compute-g20-8:09582] [ 9] perl [0x41b299] [compute-g20-8:09582] *** End of error message *** examining contents of the fasta file and run log [compute-g20-8:09581] *** Process received signal *** [compute-g20-8:09581] Signal: Segmentation fault (11) [compute-g20-8:09581] Signal code: Address not mapped (1) [compute-g20-8:09581] Failing at address: 0x848da08 #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_17 Length: 1413 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_13 Length: 2019 #--------------------------------------------------------------------- [compute-g20-8:09581] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0] [compute-g20-8:09581] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b98f9ea65b2] [compute-g20-8:09581] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b98faee1ad1] [compute-g20-8:09581] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919] [compute-g20-8:09581] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19] [compute-g20-8:09581] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09581] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09581] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09581] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4] [compute-g20-8:09581] [ 9] perl [0x41b299] [compute-g20-8:09577] *** Process received signal *** [compute-g20-8:09581] *** End of error message *** [compute-g20-8:09577] Signal: Segmentation fault (11) [compute-g20-8:09577] Signal code: Address not mapped (1) [compute-g20-8:09577] Failing at address: 0xd5b30e2 [compute-g20-8:09577] [ 0] /lib64/libpthread.so.0 [0x2b79d382aca0] [compute-g20-8:09577] [ 1] perl [0x487218] [compute-g20-8:09577] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-8:09577] [ 3] perl [0x49d9dc] [compute-g20-8:09577] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-8:09577] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09577] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09577] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09577] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b79d3a559c4] [compute-g20-8:09577] [ 9] perl [0x41b299] [compute-g20-8:09577] *** End of error message *** #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_1 Length: 1446 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks [compute-g20-8:09579] *** Process received signal *** [compute-g20-8:09579] Signal: Segmentation fault (11) [compute-g20-8:09579] Signal code: Address not mapped (1) [compute-g20-8:09579] Failing at address: 0x18d64350 examining contents of the fasta file and run log [compute-g20-8:09579] [ 0] /lib64/libpthread.so.0 [0x2b31b670fca0] [compute-g20-8:09579] [ 1] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__ham_get_meta+0x4c) [0x2b31bbd1bccc] [compute-g20-8:09579] [ 2] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbd103fb] [compute-g20-8:09579] [ 3] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get+0x1fa) [0x2b31bbd81f3a] [compute-g20-8:09579] [ 4] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get_pp+0xb4) [0x2b31bbd8db04] [compute-g20-8:09579] [ 5] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbce4b85] [compute-g20-8:09579] [ 6] /usr/local/perl/5.16.3-threaded/lib/site_perl/5.16.3/x86_64-linux-thread-multi/auto/DB_File/DB_File.so [0x2b31bbabafc9] [compute-g20-8:09579] [ 7] perl(Perl_pp_entersub+0x58f) [0x49ee4f] [compute-g20-8:09579] [ 8] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09579] [ 9] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09579] [10] perl(main+0x135) [0x41b485] [compute-g20-8:09579] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b31b693a9c4] [compute-g20-8:09579] [12] perl [0x41b299] [compute-g20-8:09579] *** End of error message *** --Next Contig-- setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks --Next Contig-- setting up GFF3 output and fasta chunks --Next Contig-- --Next Contig-- Processing run.log file... MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/3B/F3/Gc_UCSC1_contig_26//theVoid.Gc_UCSC1_contig_26/0/Gc_UCSC1_contig_26.0.all.rb.out did not finish on the last run and must be erased --Next Contig-- --Next Contig-- --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_18 Length: 937 #--------------------------------------------------------------------- Processing run.log file... Processing run.log file... Processing run.log file... --Next Contig-- FATAL: Thread terminated, causing all processes to fail --> rank=17, hostname=compute-g20-10.deepthought.umd.edu setting up GFF3 output and fasta chunks Processing run.log file... Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_14 Length: 6745 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_9 Length: 554 #--------------------------------------------------------------------- MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/FB/E4/Gc_UCSC1_contig_22//theVoid.Gc_UCSC1_contig_22/0/Gc_UCSC1_contig_22.0.all.rb.out did not finish on the last run and must be erased setting up GFF3 output and fasta chunks Processing run.log file... setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_16 Length: 995 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_26 Length: 1895 #--------------------------------------------------------------------- FATAL: Thread terminated, causing all processes to fail --> rank=16, hostname=compute-g20-10.deepthought.umd.edu #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_23 Length: 618 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_31 Length: 506 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_28 Length: 5246 #--------------------------------------------------------------------- MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/E5/53/Gc_UCSC1_contig_29//theVoid.Gc_UCSC1_contig_29/0/Gc_UCSC1_contig_29.0.all.rb.out did not finish on the last run and must be erased setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_19 Length: 880 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_22 Length: 831 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_21 Length: 12421 #--------------------------------------------------------------------- doing repeat masking FATAL: Thread terminated, causing all processes to fail --> rank=18, hostname=compute-g20-10.deepthought.umd.edu #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_29 Length: 1161 #--------------------------------------------------------------------- doing repeat masking DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 105. DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 106. DBD::SQLite::db selectcol_arrayref failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 108. DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 110. [compute-g20-7.deepthought.umd.edu:28014] 19 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-g20-7.deepthought.umd.edu:28014] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0/Gc_UCSC1_contig_9.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0 -pa 1 #-------------------------------# SIGTERM received SIGTERM received [compute-g20-7:28161] *** Process received signal *** [compute-g20-7:28161] Signal: Segmentation fault (11) [compute-g20-7:28161] Signal code: Address not mapped (1) [compute-g20-7:28161] Failing at address: 0x19a33ad0 [compute-g20-7:28161] [ 0] /lib64/libpthread.so.0 [0x2b9e1cd6bca0] [compute-g20-7:28161] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0xb0) [0x2b9e1bfb1a40] [compute-g20-7:28161] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b9e1bfb30ca] [compute-g20-7:28161] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602] [compute-g20-7:28161] [ 4] perl(Perl_do_exec3+0x46) [0x4f6e86] [compute-g20-7:28161] [ 5] perl(Perl_my_popen+0x403) [0x484d63] [compute-g20-7:28161] [ 6] perl(Perl_pp_backtick+0xc2) [0x4f0752] [compute-g20-7:28161] [ 7] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-7:28161] [ 8] perl(Perl_call_sv+0x4d1) [0x433711] [compute-g20-7:28161] [ 9] perl(Perl_sighandler+0x208) [0x4876c8] [compute-g20-7:28161] [10] /lib64/libpthread.so.0 [0x2b9e1cd6bca0] [compute-g20-7:28161] [11] /usr/local/ofed/1.5.4/lib64/libmthca-rdmav2.so [0x2b9e29187bbc] [compute-g20-7:28161] [12] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b9e2686a8dd] [compute-g20-7:28161] [13] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_progress+0x5b) [0x2b9e1bfc93cb] [compute-g20-7:28161] [14] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x205) [0x2b9e25e22005] [compute-g20-7:28161] [15] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(PMPI_Recv+0x14f) [0x2b9e1bf2927f] [compute-g20-7:28161] [16] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so(_MPI_Recv+0x59) [0x2b9e23ba8d69] [compute-g20-7:28161] [17] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so [0x2b9e23ba8f58] [compute-g20-7:28161] [18] perl(Perl_pp_entersub+0x58f) [0x49ee4f] [compute-g20-7:28161] [19] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-7:28161] [20] perl(perl_run+0x243) [0x4340f3] [compute-g20-7:28161] [21] perl(main+0x135) [0x41b485] [compute-g20-7:28161] [22] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b9e1cf969c4] [compute-g20-7:28161] [23] perl [0x41b299] [compute-g20-7:28161] *** End of error message *** running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0/Gc_UCSC1_contig_18.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0/Gc_UCSC1_contig_16.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0/Gc_UCSC1_contig_14.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0/Gc_UCSC1_contig_13.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0 -pa 1 #-------------------------------# SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0/Gc_UCSC1_contig_1.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0 -pa 1 #-------------------------------# SIGTERM received -------------------------------------------------------------------------- mpirun has exited due to process rank 17 with PID 7052 on node compute-g20-10.deepthought.umd.edu exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- SIGTERM received SIGTERM received Perl exited with active threads: 0 running and unjoined 1 finished and unjoined 0 running and detached FATAL: Thread terminated, causing all processes to fail --> rank=14, hostname=compute-g20-8.deepthought.umd.edu Perl exited with active threads: 0 running and unjoined 1 finished and unjoined 0 running and detached FATAL: Thread terminated, causing all processes to fail --> rank=12, hostname=compute-g20-8.deepthought.umd.edu [compute-g20-8:09470] *** Process received signal *** [compute-g20-8:09470] Signal: Segmentation fault (11) [compute-g20-8:09470] Signal code: Address not mapped (1) [compute-g20-8:09470] Failing at address: 0x4b0 [compute-g20-8:09470] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0] [compute-g20-8:09470] [ 1] perl(Perl_csighandler+0x23) [0x488103] [compute-g20-8:09470] [ 2] /lib64/libpthread.so.0 [0x2b03d0637ca0] [compute-g20-8:09470] [ 3] /lib64/libc.so.6(__select+0x62) [0x2b03d0913402] [compute-g20-8:09470] [ 4] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b03da142ff3] [compute-g20-8:09470] [ 5] /lib64/libpthread.so.0 [0x2b03d062f83d] [compute-g20-8:09470] [ 6] /lib64/libc.so.6(clone+0x6d) [0x2b03d091a26d] [compute-g20-8:09470] *** End of error message *** Perl exited with active threads: 0 running and unjoined 1 finished and unjoined 0 running and detached FATAL: Thread terminated, causing all processes to fail --> rank=11, hostname=compute-g20-8.deepthought.umd.edu setting up GFF3 output and fasta chunks FATAL: Thread terminated, causing all processes to fail --> rank=10, hostname=compute-g20-8.deepthought.umd.edu setting up GFF3 output and fasta chunks FATAL: Thread terminated, causing all processes to fail --> rank=13, hostname=compute-g20-8.deepthought.umd.edu FATAL: Thread terminated, causing all processes to fail --> rank=15, hostname=compute-g20-8.deepthought.umd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Feb 27 11:09:21 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 27 Feb 2014 18:09:21 +0000 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib Example : mpiexec -mca btl ^openib -n 20 maker Thanks, Carson From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM To: > Subject: Problem with OpenFabrics and infiniband Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 11:55:34 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 13:55:34 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com> Hi Carson, Thanks that fixed the issue. Cheers Ian On Feb 27, 2014, at 1:09 PM, Carson Holt wrote: > It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). > > For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib > > Example : >> mpiexec -mca btl ^openib -n 20 maker > > > Thanks, > Carson > > > From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM > To: > Subject: Problem with OpenFabrics and infiniband > > Hello, > > I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. > > IT suggestions > > If you look at the top of the error log for the problematic job, it clearly > warns of an issue with doing 'fork's within openmpi/openfabrics framework. > > In particular, the use of the fork system call is only partially supported > in the OpenFabrics software (this is the drivers, etc for the infiniband > connections). See e.g. > http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork > for more information. In particular the paragraphs starting with the > sentence with the red highlighted "it does not mean that your fork()-calling > application is safe". (The kernel, openMPI version, and OFED version are > sufficiently recent to mean that there is _some_ fork support). > > The fact that the job runs over gigE but not IB, in conjunction with the > warning from openmpi, strongly suggests that this is the issue that you are > encountering. I suspect that maker touches registered memory before the fork, > which would result in a segfault (matching what was observed). > > You can try adding the arguments > --mca mpi_warn_on_fork 0 > to the mpirun command, just in case the crash was somehow caused by openmpi's > warning, but I would not hold out much hope for that. > > ###UPDATE### This does not fix the problem. > > > Basically, it looks like maker uses some system calls like fork in a manner > which is incompatible with the current OpenFabrics software, and thus will > not work with infiniband. This situation is likely to remain until either > maker changes to be compatible with OFED, or OFED's support for the fork > system call is broadened. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 16:17:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 15:17:22 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 17:27:30 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 16:27:30 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: > Is there a corresponding protein_forward=1 option to map forward protein > names from protein2genome? > > Cheers, > Shaun > > On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) > wrote: > > Sorry I meant to say prefilter on the score in the mRNA column before > passing the gff3 to model_gff. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and > est2genome/protein2genome set to 1. Then take those results, pass them in > as model_gff and use the map_forward option to then filter the results > based on mRNA score and that would copy names onto new gene under the > standard MAKER pipeline. Eventually it?s really supposed to go into a > separate tool that will map genes onto new assemblies (but under the hood > the tool will just be calling MAKER with certain parameters restricted). I > do this because if people commonly use it mixed with things like SNAP I can > start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where > you have firm a priori knowledge of the placement of ESTs. However, while > trying it I note that est_forward implies that the est2genome predictor is > turned on, implicitly. Is this necessary for this to work? I?m after the > behavior you describe below where exonerate is made to try really hard > within a limited region to align an est, but I would not like maker to > produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that > is worthy to be promoted into a documented feature. > > THanks, > Mikael > > 26 feb 2014 kl. 17:09 skrev Carson Holt : > > It will still work without est_forward. It just works a little > differently. Keep in mind this was a hidden feature I used to find > stubborn or hard to find missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and > then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to > look for a match using all of chr1 as the input to exonerate even when > BLAST finds nothing (this is a very very slow search, but can help pick up > one or two stubborn genes that don?t remap well). To allow this, MAKER > gives exonerate looser matching parameters (i.e. allows for single base > pair introns perhaps caused by assembly errors). The logic here is that > given the fact that I already told MAKER that with some degree of > confidence I expect sequence A to map to to location X, it will try its > hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at > line 1563, but only after a BLAST alignment has already seeded it to the > region (that BLAST result has the information in its description > parameter). MAKER will then ignore seeds completely outside of maker_coor. > In addition any BLAST seeds that overlap maker_coor will get the search > space for alignment polishing adjusted to match maker_coor exactly. Also > match parameters for exonerate will not be relaxed as they were with > est_forward. > > As you can see the behavior, is slightly different (because it?s an > accidental feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading > the code, it seems that I need to supply maker_coor but not gene_id, as > well as the configuration option est_forward for this to work. Any > occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 > right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < > mikael.durling at slu.se> wrote: > > Can this use of maker_coor be used only to hint about the placement of the > ests, without affecting the naming of the final genes? Ie if I have a > database of EST where I have a priori knowledge of their rough placement, > can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > > 26 feb 2014 kl. 01:58 skrev Carson Holt : > > There is a way. It?s not a standard option and it?s undocumented, but > if you add est_forward=1 to the maker_opts.ctl file, then it will do just > that. The option won?t already be there so you?ll have to type it in. > > There is also a feature designed to work with this option. If you add > tags to your fasta headers, those can be used to guide the mapping and > naming. For example, gene_id= will ensure different isoforms > that share a common gene_id get clustered into the same gene, > and maker_coor=chr1:1-10000 in the fasta header will force a particular > sequence to only be mapped against chr1 within the range of 1-10000 bp and > just using maker_coor=chr1 will force it to only be mapped against chr1. > > This is an undocumented way to remap genes onto new assemblies using blast > alignments of earlier transcript or protein annotations as a guide. > > ?Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM > To: > Subject: [maker-devel] Mapping gene names > > Hi, > > I?m annotating a genome using a closely related genome from Genbank, using > the .frn (RNA) and .faa (protein) files from Genbank as evidence to > annotate my genome. I?ve run Maker, and the annotation seems to have worked > well. Is it possible to map the names of the genes from the related species > to my annotation? I see the *map_forward* option, which applies to the > *model_gff* parameter. Is there a similar option for *est* and *protein*? > > *maker_opts.ctl* > > est=NC_123456.frn > protein=NC_123456.faa > est2genome=1 > protein2genome=1 > > Thanks, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 27 18:13:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Feb 2014 18:13:06 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson Sent from my iPhone > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > Cheers, > Shaun > > > >> On 27 February 2014 15:17, Shaun Jackman wrote: >> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >> >> Cheers, >> Shaun >> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>> Yes. That should work as well as an accidental feature. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>> >>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>> >>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>> >>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Shaun Jackman >>>>>>>> Reply-To: Shaun Jackman >>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>> To: >>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>> >>>>>>>> maker_opts.ctl >>>>>>>> >>>>>>>> est=NC_123456.frn >>>>>>>> protein=NC_123456.faa >>>>>>>> est2genome=1 >>>>>>>> protein2genome=1 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Shaun >>>>>>>> >>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Fri Feb 28 03:40:30 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Fri, 28 Feb 2014 10:40:30 +0000 Subject: [maker-devel] maker_coor behaviour Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se> Hi, in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled. I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it. Thanks, Mikael From carsonhh at gmail.com Fri Feb 28 07:09:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 07:09:09 -0700 Subject: [maker-devel] maker_coor behaviour Message-ID: I wouldn?t use those options for standard de novo annotation. There are really other more appropriate thing that should be used instead. Both maker_coor and est_forward are destined to be part of a separate tool that will secretly just be calling MAKER, but will allow me to control what other parameters MAKER sees to avoid certain logic incompatibilities that make sense when mapping entire genes onto a new assembly, but not really for de novo annotation using ESTs. You should instead try modifying these options in the maker_bopts.ctl file ?> pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn= #Blastn eval cutoff bit_blastn= #Blastn bit cutoff depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming high evidence overlap regions en_score_limit= #Exonerate nucleotide percent of maximal score threshold If either blastn or est2genome results disappear, it is because they don?t meet one of these thresholds (blastn results that don?t meet the thresholds but are borderline are kept if exonerate does meet the thresholds, but if exonerate misses a threshold they will be thrown out). That is whey the EST in question gets thrown out and it?s why the blastn result disappears when you try and anchor it with maker_coor. You can visualize everything with a browser when your done. I still recommend the old version of Apollo for this (it?s just easier). You can try and install it using the ?./Build apollo? option from the .../maker/src/ directory, and it will be installed in .../maker/exe/apollo. It requires that you have apache ant installed to do this. Otherwise just download it from the GMOD source forge page and install it manually. Thanks, Carson On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" wrote: >Hi, > >in a previous thread, the maker_coor feature for ETSs was mentioned. I >have been trying it out, without using it for mapping gene names. I have >placed these ESTs by other means, an thought the maker_coor feature would >be a good use of this a priori knowledge. My major problem i try to solve >is that I find that some ESTs where I know where they should be aligned, >are not recruited to that position by maker?s blastn->exonerate method (I >find them on other scaffolds). So I thought maker_coor with the >est_forward behavior (as described) would be a good option to force my >evidence onto the correct position, instead of ending up supporting or >braking other models. However, as soon as I run with maker_coor tagged >est sequences, no est2genome evidence appears in the final gff3 file. The >blastn evidence is there when est_forward is disabled, but as expected, >there is no blastn evidence when est_forward is turned on. It seems >though as the evidence is used, as the QI lines indicate EST support for >both splice sites as well as exon alignments, but I have no way to >visualize and/or evaluate the congruence of evidence and models. Would it >be possible to tweak Maker into outputting the est2genome alignments when >est_forward/maker_coor is used? I couldn?t figure myself where in the >code this was handled. > >I could of course do my own exonerate alignments of these ESTs and feed >them into maker as est_gff, but if maker already has the machinery to to >this, I thought it would be a good idea to use it. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rbharris at uw.edu Fri Feb 28 13:14:55 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Fri, 28 Feb 2014 12:14:55 -0800 Subject: [maker-devel] error in snap training In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > > > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > > > Hey - > > > > I'm trying to train SNAP and am running into errors. I don't have any > EST evidence, just protein. My .gff file reports 10865 genes but when I run > maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, > a ton of overlap_prev_exon errors get written to the screen and then with I > get to the forge step I get an "impossible error5". Any help would be > greatly appreciated. > > > > Thanks! > > Rebecca > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 28 13:22:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 13:22:12 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: If it?s failing both ways I?m thinking this may be SNAP itself. Try these two different versions of SNAP. ?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz and ?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz If they both fail then contact the SNAP development group ?> korflab AT ucdavis DOT edu Thanks, Carson From: Rebecca Harris Date: Friday, February 28, 2014 at 1:14 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] error in snap training Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > >> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: >> > >> > Hey - >> > >> > I'm trying to train SNAP and am running into errors. I don't have any EST >> evidence, just protein. My .gff file reports 10865 genes but when I run >> maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a >> ton of overlap_prev_exon errors get written to the screen and then with I get >> to the forge step I get an "impossible error5". Any help would be greatly >> appreciated. >> > >> > Thanks! >> > Rebecca >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Mon Feb 3 09:31:16 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Mon, 3 Feb 2014 10:31:16 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> Message-ID: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand > what's happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? > Also, you had 29,000 transcripts from cufflinks, but fewer from > MAKER when you gave it the cufflinks data. How many transcripts did > MAKER identify with the cufflinks data? Did you still get more than > the 10,000 transcripts that you found with just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be > affecting it's performance is that it only annotates a gene where > there is both evidence (like your RNA-seq data) and an ab-initio > prediction. If a prediction is unsupported by the evidence, then > MAKER won't annotate a gene and if evidence aligns where there's no > prediction, MAKER won't annotate a gene either. What ab-initio > predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to > a gene model by setting the est2genome option to 1, but that will > usually give you many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data > and with est2genome set to 1, and let us know how that affects the > MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of > dhivya arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- > lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rebzi87 at gmail.com Tue Feb 4 15:29:41 2014 From: rebzi87 at gmail.com (Rebecca Harris) Date: Tue, 4 Feb 2014 14:29:41 -0800 Subject: [maker-devel] maker output Message-ID: Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Tue Feb 4 15:43:19 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 4 Feb 2014 16:43:19 -0600 Subject: [maker-devel] Fwd: maker annotation with cufflinks output References: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Resending this since it didnt make it to the mailing list before. > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 4 15:42:52 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 4 Feb 2014 22:42:52 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Hi Rebecca, If you're looking at the master_datastore_index.log, then you're looking for lines with the "FINISHED" status. If you do a count on those (with "grep -c" for example), that will tell you how many contigs have finished. If you have 200,000,000 contigs that you're trying to annotate, you might also consider settinng the "min_contig" parameter in the maker_opts.ctl file. This parameter sets a minimum length for a contig before MAKER tries to annotate it. Usually 5000 bp or larger is what you want. That will save you some time in the long run. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca Harris [rebzi87 at gmail.com] Sent: Tuesday, February 04, 2014 3:29 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Tue Feb 4 15:49:46 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Tue, 4 Feb 2014 22:49:46 +0000 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: > 4 feb 2014 kl. 23:32 skrev "Rebecca Harris" : > > Hi, > > I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. This is usually what I do to check if maker has finished all scaffolds. There should be one FINISHED statement for each entry in the scata file. (It might be one for every scaffold longer than the gjven minimum length. > These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Run maker -daindex to rebuild the file if you like. The number of FINISHED should not change though Mikael > > Thanks! > > Cheers, > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Feb 4 15:50:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 04 Feb 2014 15:50:10 -0700 Subject: [maker-devel] maker output In-Reply-To: References: Message-ID: Clusters are notoriously flakey, so maker is restartable (hence the need for the log file). Also since multiple nodes may write simultaneously to the log, they can munge it?s contents. You can rerun maker with the -dsindex flag to regenerate the master_datastore_index.log as well without processing anything else. You can even delete it before rebuilding it if you want to ensure all entries are uniq (run on a single cpus when you do this). Then count the number of FINISHED entries in the log. Thanks, Carson From: Rebecca Harris Date: Tuesday, February 4, 2014 at 3:29 PM To: Subject: [maker-devel] maker output Hi, I'm running maker on a cluster and am having some problems with the run ending prematurely. I would like to know if there is a straightforward way to figure out whether maker has completed. I've tried: 1) counting the number of run.log files in the datastore directly, and 2) counting the instances of "FINISHED" in the master_datastore_index.log. These numbers are inconsistent. I have 200,000 contigs in my fasta file - do I expect 200,000 run.log files? I've had to restart maker a few times - it appears that maker is appending to the master_datastore_index.log, as I find multiple instances of the same contig being finished. Thanks! Cheers, Rebecca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 11:38:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 11:38:50 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sn ap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > Hi Dhivya, > > I think there a few numbers that could be helpful to understand what's > happening here. > > How many transcripts did Trinity assembly the RNA-seq data into? Also, you had > 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the > cufflinks data. How many transcripts did MAKER identify with the cufflinks > data? Did you still get more than the 10,000 transcripts that you found with > just the Trinity data? > > A key part of MAKER's approach to genome annotation that might be affecting > it's performance is that it only annotates a gene where there is both evidence > (like your RNA-seq data) and an ab-initio prediction. If a prediction is > unsupported by the evidence, then MAKER won't annotate a gene and if evidence > aligns where there's no prediction, MAKER won't annotate a gene either. What > ab-initio predictors are you using and have they been trained specific genome? > > You can force MAKER to automatically promote evidence alignments to a gene > model by setting the est2genome option to 1, but that will usually give you > many false positives. > > Try rerunning it with either the Trinity data or the Cufflinks data and with > est2genome set to 1, and let us know how that affects the MAKER results. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya > arasappan [darasappan at gmail.com] > Sent: Thursday, January 30, 2014 11:18 AM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] maker annotation with cufflinks output > > Hello, > > I am trying to annotate a 200 mb plant genome for which I have a very > good assembly. > > I tried to denovo assemble RNA-seq data using trinity and ran maker > using my genome assembly and the trinity results. I did not get as > many transcripts as expected, around 10,000 transcripts. > > So, I decided to try a different approach. I did a genome assisted > assembly of the RNA-seq data using tophat/cufflinks. This pipeline > generated 21,000 genes, 29,000 transcripts. I then ran maker using my > genome assembly and the cufflinks result. I get much less number of > transcripts as a result. > > If cufflinks found 29000 transcripts by mapping to the genome, I'm > confused as to why maker is not finding the same. > > Any suggestions would be appreciated. > > Thanks > Dhivya > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 12:28:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 19:28:48 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 13:13:57 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 14:13:57 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, Message-ID: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid > sequences from related organisms and some kind of omnibus source > like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 5 13:36:26 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 5 Feb 2014 20:36:26 +0000 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com>, , <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Hi Dhivya, In genome annotation, often you want to use as many sources for evidence as is reasonable, but those sources should be distinct. It will confuse downstream annotation efforts if your protein evidence is actually based on the RNA-seq data. Using the trinotate results for protein evidence here restricts you first to the proteins coded by the transcripts in the RNA-seq data, which may be incomplete, and secondly to the proteins that trinotate could annotate from among the transcripts. The problem that Carson mentioned with the SNAP HMM file is a real possibility also. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: dhivya arasappan [darasappan at gmail.com] Sent: Wednesday, February 05, 2014 1:13 PM To: Daniel Ence Cc: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: Hi Dhivya, Are the protein matches in your results coming from your annotations of the transcriptome? You should really use amino-acid sequences from related organisms and some kind of omnibus source like SwissProt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Carson Holt [carsonhh at gmail.com] Sent: Wednesday, February 05, 2014 11:38 AM To: dhivya arasappan; Daniel Ence Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] maker annotation with cufflinks output Do you have any features of type snap in your results from step 3? We?ve had a couple of recent posts where after training snap was giving no results, and as a result maker couldn?t give any genes. One cause of something like that may be your step 2. Make sure the ZFF wasn?t empty you used to train with. The maker2zff script uses filters to only put the best genes in the off file, and if all your genes fail the filtering then you are training with an empty ZFF. Also you should use proteins from a related species as your protein file. I see that you protein marches are varying wildly from run to run? So is your contig count? Were the subset of contigs you have results for long enough to contain genes? ?Carson From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output Hi Daniel, I was able to check on some of those questions. 1. From trinity assembly: I started with 102000 contigs. I used trinotate to annotate proteins in this. I ran maker on this data with est2genome set to 1. The output looks like this (most important parts on top): 6653 gene 46675 exon 280534 protein_match 59934 CDS 969 contig 105388 expressed_sequence_match 12584 five_prime_UTR 78565 match 1401369 match_part 10180 mRNA 11545 three_prime_UTR 2. From cufflinks assembly: I started with 133380 entries (out of which there are 29,000 transcripts). I used the protein sequences from trinity assembly. I ran maker on this data with est2genome set to 1. The output looks like this: 29 gene 75 exon 573659 protein_match 67 CDS 1099 contig 269298 expressed_sequence_match 23 five_prime_UTR 173844 match 2221846 match_part 29 mRNA 23 three_prime_UTR The genes annotated using the trinity assembly is lower than expected, so I went the cufflinks route. I dont understand why when using the cufflinks transcripts, even less genes are being found. 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then used that training set to rerun maker: snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap/RHA.hmm est2genome=0 And again I got results with no entries for gene, exon, CDS etc. 957 contig 46555 expressed_sequence_match 43651 match 553633 match_part 113738 protein_match As I mentioned in another email, cegma results indicated that the genome was more than 90% complete. Any suggestions would be helpful. Thank you Dhivya On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: Hi Dhivya, I think there a few numbers that could be helpful to understand what's happening here. How many transcripts did Trinity assembly the RNA-seq data into? Also, you had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it the cufflinks data. How many transcripts did MAKER identify with the cufflinks data? Did you still get more than the 10,000 transcripts that you found with just the Trinity data? A key part of MAKER's approach to genome annotation that might be affecting it's performance is that it only annotates a gene where there is both evidence (like your RNA-seq data) and an ab-initio prediction. If a prediction is unsupported by the evidence, then MAKER won't annotate a gene and if evidence aligns where there's no prediction, MAKER won't annotate a gene either. What ab-initio predictors are you using and have they been trained specific genome? You can force MAKER to automatically promote evidence alignments to a gene model by setting the est2genome option to 1, but that will usually give you many false positives. Try rerunning it with either the Trinity data or the Cufflinks data and with est2genome set to 1, and let us know how that affects the MAKER results. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya arasappan [darasappan at gmail.com] Sent: Thursday, January 30, 2014 11:18 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] maker annotation with cufflinks output Hello, I am trying to annotate a 200 mb plant genome for which I have a very good assembly. I tried to denovo assemble RNA-seq data using trinity and ran maker using my genome assembly and the trinity results. I did not get as many transcripts as expected, around 10,000 transcripts. So, I decided to try a different approach. I did a genome assisted assembly of the RNA-seq data using tophat/cufflinks. This pipeline generated 21,000 genes, 29,000 transcripts. I then ran maker using my genome assembly and the cufflinks result. I get much less number of transcripts as a result. If cufflinks found 29000 transcripts by mapping to the genome, I'm confused as to why maker is not finding the same. Any suggestions would be appreciated. Thanks Dhivya _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 5 13:38:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 05 Feb 2014 13:38:44 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: Protein data doesn?t have to be from that closely a related species. This is because genes maintain homology at the amino acid level across even very large evolutionary distances. Having a closer related species just ensures that genome contents are similar (fewer losses/gains relative to each other). And use the entire proteome of at least one related species (just using a database like swiss-prot is not sufficient). Using translated mRNA-seq data will not give you any new information that was not already available from the untranslated sequence. Plus it will introduce the complicating artifacts that mRNA-seq generates into the protein part of the pipeline (gene merging, incorrect assembly, and false calls caused by background transcription). A big gotcha with mRNA-seq is that all of your genome gets transcribed at a low level, not just the genes, so you will always have contamination that does not represent real gene models. Also in the end you really only expect to capture about 50% of the genes with mRNA-seq (maybe 70% if you are fortunate - and most of those will be partial). So using the proteins from another species, is important to improve sensitivity, and fix many of the issues that arise from the noisy nature of mRNA-seq. In fact if you were forced to use only one (either protein evidence or mRNA-seq) you will actually get better annotations from the protein evidence in most cases. You get better annotations when you use both, but if using only one of them, the proteins from another species are better, and noisy mRNA-seq will be the primary source of annotation error. Thanks, Carson From: dhivya arasappan Date: Wednesday, February 5, 2014 at 1:13 PM To: Daniel Ence Cc: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello Daniel and Carson, Thanks for your replies. Yes I used the the protein sequences resulting from annotation of trinity assembly (using trinotate). I'll try using protein sequences from related species (though there arent sequences from closely related orgs). Could you tell me a little about why protein data from annotating my rnaseq data would not work best here? Thanks Dhivya On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > Hi Dhivya, Are the protein matches in your results coming from your > annotations of the transcriptome? You should really use amino-acid sequences > from related organisms and some kind of omnibus source like SwissProt. > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > From: Carson Holt [carsonhh at gmail.com] > Sent: Wednesday, February 05, 2014 11:38 AM > To: dhivya arasappan; Daniel Ence > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Wed Feb 5 22:16:43 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Wed, 5 Feb 2014 23:16:43 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <4726757C-2C1A-451F-8E79-D0C307A78F7D@gmail.com> Message-ID: <1188173E-53C1-4FFE-B790-B710C3A55B86@gmail.com> Thank you both for those explanations. I'll get back to you after I try rerunning maker. Dhivya On Feb 5, 2014, at 2:38 PM, Carson Holt wrote: > Protein data doesn?t have to be from that closely a related > species. This is because genes maintain homology at the amino acid > level across even very large evolutionary distances. Having a > closer related species just ensures that genome contents are similar > (fewer losses/gains relative to each other). And use the entire > proteome of at least one related species (just using a database like > swiss-prot is not sufficient). > > Using translated mRNA-seq data will not give you any new information > that was not already available from the untranslated sequence. Plus > it will introduce the complicating artifacts that mRNA-seq generates > into the protein part of the pipeline (gene merging, incorrect > assembly, and false calls caused by background transcription). A > big gotcha with mRNA-seq is that all of your genome gets transcribed > at a low level, not just the genes, so you will always have > contamination that does not represent real gene models. Also in the > end you really only expect to capture about 50% of the genes with > mRNA-seq (maybe 70% if you are fortunate - and most of those will be > partial). So using the proteins from another species, is important > to improve sensitivity, and fix many of the issues that arise from > the noisy nature of mRNA-seq. In fact if you were forced to use > only one (either protein evidence or mRNA-seq) you will actually get > better annotations from the protein evidence in most cases. You get > better annotations when you use both, but if using only one of them, > the proteins from another species are better, and noisy mRNA-seq > will be the primary source of annotation error. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Wednesday, February 5, 2014 at 1:13 PM > To: Daniel Ence > Cc: Carson Holt , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello Daniel and Carson, > > Thanks for your replies. > > Yes I used the the protein sequences resulting from annotation of > trinity assembly (using trinotate). I'll try using protein > sequences from related species (though there arent sequences from > closely related orgs). Could you tell me a little about why protein > data from annotating my rnaseq data would not work best here? > > Thanks > Dhivya > > On Feb 5, 2014, at 1:28 PM, Daniel Ence wrote: > >> Hi Dhivya, Are the protein matches in your results coming from your >> annotations of the transcriptome? You should really use amino-acid >> sequences from related organisms and some kind of omnibus source >> like SwissProt. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> From: Carson Holt [carsonhh at gmail.com] >> Sent: Wednesday, February 05, 2014 11:38 AM >> To: dhivya arasappan; Daniel Ence >> Cc: maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Thu Feb 6 04:02:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Thu, 6 Feb 2014 11:02:37 +0000 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: Hi Carson, it?s nice to see all these new features in maker. I gave the trnascan option a try by enabling it in the config file for one of my fungal genomes. It failed though, with this error message: ERROR: You found a tRNA with an intron! This should not happen --> rank=12, hostname=my-mgrid6 ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scf_013 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf_013 I checked the trnascan output (scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, and the output seems valid to me: scf_013 1 189339 189410 Thr AGT 0 0 82.83 scf_013 2 510381 510462 Ser AGA 0 0 67.09 scf_013 3 586886 587000 Leu CAA 586924 586956 57.97 scf_013 4 942166 942069 Leu AAG 942128 942113 57.48 scf_013 5 169102 168993 Leu TAA 169065 169037 56.49 Hope this can be of some help while debugging. I?ll leave trnascan off for now. thanks, Mikael 10 jan 2014 kl. 22:03 skrev Carson Holt : > Hi Mikael, > > The options are part of the new MAKER-P integration > (http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstrac > t). Additional documentation/tutorials will be forthcoming - probably in > a nice wiki page as part of the upcoming GMOD Malaysia courses in February > or alternatively with the annual GMOD summer school. The tRNA option is > easy enough to turn on (just set trna=1 in the maker_opts.ctl file). > > Thanks, > Carson > > > > On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" > wrote: > >> Hi Carson and other maker developers, >> >> I was reading the source code of the latest maker release and noted >> several references to ncRNAs, snoscan and trnascan. Can these be >> incorporated into the normal annotation workflow? If so, are there any >> instructions available for that? >> >> best regards, >> Mikael Durling >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From darasappan at gmail.com Thu Feb 6 07:52:12 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 08:52:12 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <73AFCD9F-3B60-4C9C-9E03-35BC682E14ED@gmail.com> Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? Snippet of my genome.ann file is attached (since its too big for the list) for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? > We?ve had a couple of recent posts where after training snap was > giving no results, and as a result maker couldn?t give any genes. > One cause of something like that may be your step 2. Make sure the > ZFF wasn?t empty you used to train with. The maker2zff script uses > filters to only put the best genes in the off file, and if all your > genes fail the filtering then you are training with an empty ZFF. > > Also you should use proteins from a related species as your protein > file. I see that you protein marches are varying wildly from run to > run? So is your contig count? Were the subset of contigs you have > results for long enough to contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used > trinotate to annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks > like this (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of > which there are 29,000 transcripts). I used the protein sequences > from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks > like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than > expected, so I went the cufflinks route. I dont understand why when > using the cufflinks transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train > SNAP. I then used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ > maker_mpi_withAlltrinity/snap/RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the > genome was more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand >> what's happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? >> Also, you had 29,000 transcripts from cufflinks, but fewer from >> MAKER when you gave it the cufflinks data. How many transcripts did >> MAKER identify with the cufflinks data? Did you still get more than >> the 10,000 transcripts that you found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be >> affecting it's performance is that it only annotates a gene where >> there is both evidence (like your RNA-seq data) and an ab-initio >> prediction. If a prediction is unsupported by the evidence, then >> MAKER won't annotate a gene and if evidence aligns where there's no >> prediction, MAKER won't annotate a gene either. What ab-initio >> predictors are you using and have they been trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to >> a gene model by setting the est2genome option to 1, but that will >> usually give you many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data >> and with est2genome set to 1, and let us know how that affects the >> MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >> of dhivya arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using >> my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing > list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.ann Type: application/octet-stream Size: 15761 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.genome.dna Type: application/octet-stream Size: 3075 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 09:01:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:01:04 -0700 Subject: [maker-devel] ncRNA support in maker In-Reply-To: References: Message-ID: I?m making a new release this weekend, but if you have access to the devel version, you can test now. All changes have been committed tot he subversion repository. Thanks, Carson On 2/6/14, 4:02 AM, "Mikael Brandstr?m Durling" wrote: >Hi Carson, > >it?s nice to see all these new features in maker. > >I gave the trnascan option a try by enabling it in the config file for >one of my fungal genomes. It failed though, with this error message: > >ERROR: You found a tRNA with an intron! This should not happen >--> rank=12, hostname=my-mgrid6 >ERROR: Failed while gathering ab-init output files >ERROR: Chunk failed at level:1, tier_type:2 >FAILED CONTIG:scf_013 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf_013 > >I checked the trnascan output >(scf_013.abinit_nomask.0.eukaryotic.trnascan) in theVoid for that contig, >and the output seems valid to me: > >scf_013 1 189339 189410 Thr AGT 0 0 >82.83 >scf_013 2 510381 510462 Ser AGA 0 0 >67.09 >scf_013 3 586886 587000 Leu CAA 586924 586956 >57.97 >scf_013 4 942166 942069 Leu AAG 942128 942113 >57.48 >scf_013 5 169102 168993 Leu TAA 169065 169037 >56.49 > > >Hope this can be of some help while debugging. I?ll leave trnascan off >for now. > >thanks, > >Mikael > > >10 jan 2014 kl. 22:03 skrev Carson Holt : > >> Hi Mikael, >> >> The options are part of the new MAKER-P integration >> >>(http://www.plantphysiol.org/content/early/2013/12/06/pp.113.230144.abstr >>ac >> t). Additional documentation/tutorials will be forthcoming - probably >>in >> a nice wiki page as part of the upcoming GMOD Malaysia courses in >>February >> or alternatively with the annual GMOD summer school. The tRNA option is >> easy enough to turn on (just set trna=1 in the maker_opts.ctl file). >> >> Thanks, >> Carson >> >> >> >> On 1/10/14, 2:48 AM, "Mikael Brandstr?m Durling" >> wrote: >> >>> Hi Carson and other maker developers, >>> >>> I was reading the source code of the latest maker release and noted >>> several references to ncRNAs, snoscan and trnascan. Can these be >>> incorporated into the normal annotation workflow? If so, are there any >>> instructions available for that? >>> >>> best regards, >>> Mikael Durling >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Thu Feb 6 09:05:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 09:05:05 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: Your genome.dna file has no sequence? Did you by any chance strip the fasta sequence from the GFF3 you are using as input to maker2zff? There should be fasta sequence at the end of that file. Also can I see the GFF3 file you are using as input to maker2zff. Thanks, Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 7:47 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Hello, I does appear than my genome.ann file from maker2zff script has data in it. However, the SNAP steps after that have created empty files. The following are all empty: alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann When I tried to get gene stats or validate genome.ann, I get errors like this for all of them: fathom genome.ann genome.dna -gene-stats |more MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds exon-1:out_of_bounds MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds exon-21:out_of_bounds I'm not sure why the annotation I'm seeing in genome.ann are all showing up as errors. I realize this may be an issue with snap, but are you familiar with anything like this? My genome.ann file is attached for reference. Thanks Dhivya On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > Do you have any features of type snap in your results from step 3? We?ve had > a couple of recent posts where after training snap was giving no results, and > as a result maker couldn?t give any genes. One cause of something like that > may be your step 2. Make sure the ZFF wasn?t empty you used to train with. > The maker2zff script uses filters to only put the best genes in the off file, > and if all your genes fail the filtering then you are training with an empty > ZFF. > > Also you should use proteins from a related species as your protein file. I > see that you protein marches are varying wildly from run to run? So is your > contig count? Were the subset of contigs you have results for long enough to > contain genes? > > ?Carson > > From: dhivya arasappan > Date: Monday, February 3, 2014 at 9:31 AM > To: Daniel Ence > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hi Daniel, > > I was able to check on some of those questions. > > 1. From trinity assembly: I started with 102000 contigs. I used trinotate to > annotate proteins in this. > > I ran maker on this data with est2genome set to 1. The output looks like this > (most important parts on top): > > 6653 gene > 46675 exon > 280534 protein_match > 59934 CDS > 969 contig > 105388 expressed_sequence_match > 12584 five_prime_UTR > 78565 match > 1401369 match_part > 10180 mRNA > 11545 three_prime_UTR > > 2. From cufflinks assembly: I started with 133380 entries (out of which there > are 29,000 transcripts). I used the protein sequences from trinity assembly. > > I ran maker on this data with est2genome set to 1. The output looks like this: > 29 gene > 75 exon > 573659 protein_match > 67 CDS > 1099 contig > 269298 expressed_sequence_match > 23 five_prime_UTR > 173844 match > 2221846 match_part > 29 mRNA > 23 three_prime_UTR > > The genes annotated using the trinity assembly is lower than expected, so I > went the cufflinks route. I dont understand why when using the cufflinks > transcripts, even less genes are being found. > > 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then > used that training set to rerun maker: > snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/snap > /RHA.hmm > est2genome=0 > > And again I got results with no entries for gene, exon, CDS etc. > 957 contig > 46555 expressed_sequence_match > 43651 match > 553633 match_part > 113738 protein_match > > As I mentioned in another email, cegma results indicated that the genome was > more than 90% complete. Any suggestions would be helpful. > > Thank you > Dhivya > > > > > On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: > >> Hi Dhivya, >> >> I think there a few numbers that could be helpful to understand what's >> happening here. >> >> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >> the cufflinks data. How many transcripts did MAKER identify with the >> cufflinks data? Did you still get more than the 10,000 transcripts that you >> found with just the Trinity data? >> >> A key part of MAKER's approach to genome annotation that might be affecting >> it's performance is that it only annotates a gene where there is both >> evidence (like your RNA-seq data) and an ab-initio prediction. If a >> prediction is unsupported by the evidence, then MAKER won't annotate a gene >> and if evidence aligns where there's no prediction, MAKER won't annotate a >> gene either. What ab-initio predictors are you using and have they been >> trained specific genome? >> >> You can force MAKER to automatically promote evidence alignments to a gene >> model by setting the est2genome option to 1, but that will usually give you >> many false positives. >> >> Try rerunning it with either the Trinity data or the Cufflinks data and with >> est2genome set to 1, and let us know how that affects the MAKER results. >> >> Thanks, >> Daniel >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >> arasappan [darasappan at gmail.com] >> Sent: Thursday, January 30, 2014 11:18 AM >> To: maker-devel at yandell-lab.org >> Subject: [maker-devel] maker annotation with cufflinks output >> >> Hello, >> >> I am trying to annotate a 200 mb plant genome for which I have a very >> good assembly. >> >> I tried to denovo assemble RNA-seq data using trinity and ran maker >> using my genome assembly and the trinity results. I did not get as >> many transcripts as expected, around 10,000 transcripts. >> >> So, I decided to try a different approach. I did a genome assisted >> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >> genome assembly and the cufflinks result. I get much less number of >> transcripts as a result. >> >> If cufflinks found 29000 transcripts by mapping to the genome, I'm >> confused as to why maker is not finding the same. >> >> Any suggestions would be appreciated. >> >> Thanks >> Dhivya >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 6 10:04:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 06 Feb 2014 10:04:25 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: Could you give me the file without using 'head? to trim it, its cutting it before it reaches the part I?m interested in. ?Carson From: dhivya arasappan Date: Thursday, February 6, 2014 at 10:01 AM To: Carson Holt Cc: Daniel Ence , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker annotation with cufflinks output Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip the fasta > sequence from the GFF3 you are using as input to maker2zff? There should be > fasta sequence at the end of that file. Also can I see the GFF3 file you are > using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data in it. > However, the SNAP steps after that have created empty files. The following > are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors like this > for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds > exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds > exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds > exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds > exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds > exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds > exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds > exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds > exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds > exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all showing up as > errors. I realize this may be an issue with snap, but are you familiar with > anything like this? My genome.ann file is attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? We?ve had >> a couple of recent posts where after training snap was giving no results, and >> as a result maker couldn?t give any genes. One cause of something like that >> may be your step 2. Make sure the ZFF wasn?t empty you used to train with. >> The maker2zff script uses filters to only put the best genes in the off file, >> and if all your genes fail the filtering then you are training with an empty >> ZFF. >> >> Also you should use proteins from a related species as your protein file. I >> see that you protein marches are varying wildly from run to run? So is your >> contig count? Were the subset of contigs you have results for long enough to >> contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used trinotate to >> annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks like this >> (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of which there >> are 29,000 transcripts). I used the protein sequences from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks like >> this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than expected, so I >> went the cufflinks route. I dont understand why when using the cufflinks >> transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I then >> used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/sna >> p/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the genome was >> more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand what's >>> happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave it >>> the cufflinks data. How many transcripts did MAKER identify with the >>> cufflinks data? Did you still get more than the 10,000 transcripts that you >>> found with just the Trinity data? >>> >>> A key part of MAKER's approach to genome annotation that might be affecting >>> it's performance is that it only annotates a gene where there is both >>> evidence (like your RNA-seq data) and an ab-initio prediction. If a >>> prediction is unsupported by the evidence, then MAKER won't annotate a gene >>> and if evidence aligns where there's no prediction, MAKER won't annotate a >>> gene either. What ab-initio predictors are you using and have they been >>> trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments to a gene >>> model by setting the est2genome option to 1, but that will usually give you >>> many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks data and with >>> est2genome set to 1, and let us know how that affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of dhivya >>> arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From darasappan at gmail.com Thu Feb 6 10:01:44 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Thu, 6 Feb 2014 11:01:44 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> Message-ID: <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Oh yes I did- I took just the non sequence entries in the gff file and used that as my input. I will rerun snap with the gff file containing the sequences as well. I'm attaching a snippet of the gff file that I used as input to maker2zff. Thanks for your help Dhivya On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: > Your genome.dna file has no sequence? Did you by any chance strip > the fasta sequence from the GFF3 you are using as input to > maker2zff? There should be fasta sequence at the end of that file. > Also can I see the GFF3 file you are using as input to maker2zff. > > Thanks, > Carson > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 7:47 AM > To: Carson Holt > Cc: Daniel Ence , "maker-devel at yandell-lab.org > " > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Hello, > > I does appear than my genome.ann file from maker2zff script has data > in it. However, the SNAP steps after that have created empty files. > The following are all empty: > > alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna > alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann > > When I tried to get gene stats or validate genome.ann, I get errors > like this for all of them: > > fathom genome.ann genome.dna -gene-stats |more > MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds > MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds > exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds > exon-2:out_of_bounds exon-1:out_of_bounds > MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds > MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds > exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds > exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds > exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds > exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds > exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds > exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds > exon-20:out_of_bounds exon-21:out_of_bounds > > I'm not sure why the annotation I'm seeing in genome.ann are all > showing up as errors. I realize this may be an issue with snap, but > are you familiar with anything like this? My genome.ann file is > attached for reference. > > Thanks > Dhivya > > On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: > >> Do you have any features of type snap in your results from step 3? >> We?ve had a couple of recent posts where after training snap was >> giving no results, and as a result maker couldn?t give any genes. >> One cause of something like that may be your step 2. Make sure the >> ZFF wasn?t empty you used to train with. The maker2zff script uses >> filters to only put the best genes in the off file, and if all your >> genes fail the filtering then you are training with an empty ZFF. >> >> Also you should use proteins from a related species as your protein >> file. I see that you protein marches are varying wildly from run >> to run? So is your contig count? Were the subset of contigs you >> have results for long enough to contain genes? >> >> ?Carson >> >> From: dhivya arasappan >> Date: Monday, February 3, 2014 at 9:31 AM >> To: Daniel Ence >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Hi Daniel, >> >> I was able to check on some of those questions. >> >> 1. From trinity assembly: I started with 102000 contigs. I used >> trinotate to annotate proteins in this. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this (most important parts on top): >> >> 6653 gene >> 46675 exon >> 280534 protein_match >> 59934 CDS >> 969 contig >> 105388 expressed_sequence_match >> 12584 five_prime_UTR >> 78565 match >> 1401369 match_part >> 10180 mRNA >> 11545 three_prime_UTR >> >> 2. From cufflinks assembly: I started with 133380 entries (out of >> which there are 29,000 transcripts). I used the protein sequences >> from trinity assembly. >> >> I ran maker on this data with est2genome set to 1. The output looks >> like this: >> 29 gene >> 75 exon >> 573659 protein_match >> 67 CDS >> 1099 contig >> 269298 expressed_sequence_match >> 23 five_prime_UTR >> 173844 match >> 2221846 match_part >> 29 mRNA >> 23 three_prime_UTR >> >> The genes annotated using the trinity assembly is lower than >> expected, so I went the cufflinks route. I dont understand why when >> using the cufflinks transcripts, even less genes are being found. >> >> 3. Training SNAP: I used the results of maker from 1 to train >> SNAP. I then used that training set to rerun maker: >> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >> maker_mpi_withAlltrinity/snap/RHA.hmm >> est2genome=0 >> >> And again I got results with no entries for gene, exon, CDS etc. >> 957 contig >> 46555 expressed_sequence_match >> 43651 match >> 553633 match_part >> 113738 protein_match >> >> As I mentioned in another email, cegma results indicated that the >> genome was more than 90% complete. Any suggestions would be helpful. >> >> Thank you >> Dhivya >> >> >> >> >> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >> >>> Hi Dhivya, >>> >>> I think there a few numbers that could be helpful to understand >>> what's happening here. >>> >>> How many transcripts did Trinity assembly the RNA-seq data into? >>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>> MAKER when you gave it the cufflinks data. How many transcripts >>> did MAKER identify with the cufflinks data? Did you still get more >>> than the 10,000 transcripts that you found with just the Trinity >>> data? >>> >>> A key part of MAKER's approach to genome annotation that might be >>> affecting it's performance is that it only annotates a gene where >>> there is both evidence (like your RNA-seq data) and an ab-initio >>> prediction. If a prediction is unsupported by the evidence, then >>> MAKER won't annotate a gene and if evidence aligns where there's >>> no prediction, MAKER won't annotate a gene either. What ab-initio >>> predictors are you using and have they been trained specific genome? >>> >>> You can force MAKER to automatically promote evidence alignments >>> to a gene model by setting the est2genome option to 1, but that >>> will usually give you many false positives. >>> >>> Try rerunning it with either the Trinity data or the Cufflinks >>> data and with est2genome set to 1, and let us know how that >>> affects the MAKER results. >>> >>> Thanks, >>> Daniel >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ________________________________________ >>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf >>> of dhivya arasappan [darasappan at gmail.com] >>> Sent: Thursday, January 30, 2014 11:18 AM >>> To: maker-devel at yandell-lab.org >>> Subject: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I am trying to annotate a 200 mb plant genome for which I have a >>> very >>> good assembly. >>> >>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>> using my genome assembly and the trinity results. I did not get as >>> many transcripts as expected, around 10,000 transcripts. >>> >>> So, I decided to try a different approach. I did a genome assisted >>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>> using my >>> genome assembly and the cufflinks result. I get much less number of >>> transcripts as a result. >>> >>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>> confused as to why maker is not finding the same. >>> >>> Any suggestions would be appreciated. >>> >>> Thanks >>> Dhivya >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> _______________________________________________ maker-devel mailing >> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: head.cat.formatted.gff Type: application/octet-stream Size: 19905 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 6 17:22:57 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 6 Feb 2014 16:22:57 -0800 Subject: [maker-devel] Adding MAKER to Homebrew for ease of installation Message-ID: Hi MAKER developers, I?d like to add MAKER to Homebrew to make the installation of MAKER and its dependencies as straight forward as brew install maker. Homebrew is a system for installing software, originally developed for Mac OS, and now also for Linux through Linuxbrew. Homebrew/science is a collection of scientific software, which includes a lot of bioinformatics software. I?ve created a prototype for the MAKER installation script(called a formula, in Homebrew parlance). Is there a static URL for the source code of MAKER? The current formula won?t work out of the box, because part of the URLdepends on the user?s unique ID: http://yandell.topaz.genetics.utah.edu/maker_downloads/$key/maker-2.28.tgz. Would you be interested in adding MAKER to Homebrew? I know MAKER must be licensed for commercial use. It is possible for Homebrew to display a notice of the MAKER license when it?s installed. MAKER is not available for commercial use without a license. Those wishing to license MAKER for commercial use should contact Beth Drees at the University of Utah TCO to discuss your needs. Cheers, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Fri Feb 7 06:29:27 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Fri, 7 Feb 2014 08:29:27 -0500 Subject: [maker-devel] NCBI feature table Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Hello Maker Developers, I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. Cheers Ian From mike.thon at gmail.com Fri Feb 7 07:14:06 2014 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 7 Feb 2014 15:14:06 +0100 Subject: [maker-devel] NCBI feature table In-Reply-To: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> Message-ID: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Hi Ian - We've been struggling with this too and I started developing a script to convert the maker gff into ncbi's .tbl format. However we found that some of the gene models required manual editing so what we do is import the gff into a commercial application called Geneious where we do the edits. From there we export the data in genbank format and then convert it to .tbl format with a script. Our submission just passed the automated checks and we're waiting for the manual review. Probably none of my code will help you, and in any case its kind of a mess. The only advice I can offer is to say that you'll probably need some manual editing in your workflow, if not Apollo, then some other app. In that case you'll need to convert the output of that app into .tbl format. > On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics wrote: > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cexzurjimenezjr at gmail.com Thu Feb 6 22:27:13 2014 From: cexzurjimenezjr at gmail.com (Cexzur Jimenez Jr.) Date: Fri, 7 Feb 2014 13:27:13 +0800 Subject: [maker-devel] Testing MAKER After Installation Message-ID: Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1502 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1320 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4541 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:11:44 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:11:44 +0000 Subject: [maker-devel] Maker installation In-Reply-To: References: Message-ID: Hi Tracy, The older apollo is pretty much deprecated. There are still people who like to use it though (myself among them). You can download and install it manually from here ?> http://sourceforge.net/projects/gmod/files/Apollo/. If you want to let MAKER install it for you, you can edit the URL in the .../maker/src/locations file to be this ?> http://weatherby.genetics.utah.edu/apollo/apollo.tar.gz You can also use Web-Apollo for your data if you want, and that is what I would recommend. On a side note, if you are trying to install the old Apollo as part of the optional web-based GUI, I?d recommend not doing that. The GUI is really only for demonstration purposes or very small datasets. It is not for production (that is why it is off by default). Thanks, Carson From: Tracy Smith > Date: Friday, February 7, 2014 at 10:48 AM To: Carson Holt > Cc: > Subject: Maker installation Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip" but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:28:29 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:28:29 +0000 Subject: [maker-devel] NCBI feature table In-Reply-To: <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> References: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8@gmail.com> <7239FEAE-64AF-4F91-B608-EDDF44B7B51D@gmail.com> Message-ID: Yes. The non-web version of apollo can open GFF3 and then save to table format ?> http://sourceforge.net/projects/gmod/files/Apollo/ I?ve also attached a script made by a lab member that can convert MAKER derived GFF3 gene entries into raw table format, and I?ve CC?d the scripts author (Michael Campbell) incase you have any questions. Thanks, Carson On 2/7/14, 7:14 AM, "Michael Thon" wrote: >Hi Ian - > >We've been struggling with this too and I started developing a script to >convert the maker gff into ncbi's .tbl format. However we found that >some of the gene models required manual editing so what we do is import >the gff into a commercial application called Geneious where we do the >edits. From there we export the data in genbank format and then convert >it to .tbl format with a script. Our submission just passed the automated >checks and we're waiting for the manual review. Probably none of my code >will help you, and in any case its kind of a mess. The only advice I can >offer is to say that you'll probably need some manual editing in your >workflow, if not Apollo, then some other app. In that case you'll need >to convert the output of that app into .tbl format. > >> On Feb 7, 2014, at 2:29 PM, UMD Bioinformatics >> wrote: >> >> Hello Maker Developers, >> >> I have used this software with great success and I continue to look to >>it going forward. However, as I?m getting ready to submit my annotations >>to NCBI with the genomes I haven?t found a straightforward method of >>turning the MAKER produced GFF files into a NCBI feature table. What is >>the process for creating this table? It seem that the format NCBI is >>looking for is unique and I haven?t uncovered any scripts or tools to >>assist in the creation of this table from my annotation files. If anyone >>has any insight on this issue it would be greatly appreciated. >> >> Cheers >> Ian >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff32table Type: application/octet-stream Size: 7511 bytes Desc: gff32table URL: From carson.holt at genetics.utah.edu Fri Feb 7 11:31:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Fri, 7 Feb 2014 18:31:17 +0000 Subject: [maker-devel] Testing MAKER After Installation In-Reply-To: References: Message-ID: That can happen on some systems with that very old version of MAKER. Use MAKER 2.28 or 2.30 instead ?> http://www.yandell-lab.org/software/maker.html Thanks, Carson From: "Cexzur Jimenez Jr." > Date: Thursday, February 6, 2014 at 10:27 PM To: > Subject: [maker-devel] Testing MAKER After Installation Hello, I have finished installing MAKER marked by "PERL Dependencies: INSTALLED, External Programs: INSTALLED, MPI SUPPORT: NOT CONFIGURED, MAKER: INSTALLED" and it seems everything's fine. I'm using MAKER 2.10 and I have followed the installation instructions both in its corresponding "README" and "INSTALL" files and the 2012 GMOD MAKER Tutorial. After editing the three configuration files and run with "maker", I saw the following error in my terminal. I have searched Google and tried the solutions offered there but the error is still showing. Below is the error I got: Can't locate package GDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package NDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. Can't locate package SDBM_File for @AnyDBM_File::ISA at /usr/lib/perl/5.14/DB_File.pm line 287. A data structure will be created for you at: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore To access files for individual sequences use the datastore index: /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_master_datastore_index.log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: contig-dpp-500-500 Length: 32156 #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased #--------------------------------------------------------------------- Now retrying the contig!! SeqID: contig-dpp-500-500 Length: 32156 Retry: 1!! #--------------------------------------------------------------------- running repeat masker. #--------- command -------------# Widget::RepeatMasker: /usr/local/maker/exe/RepeatMasker/RepeatMasker /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb -species all -dir /home/cexzurjimenezjr/Documents/data/dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500 -pa 1 #-------------------------------# Building general libraries in: /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general RepeatMasker::createLib(): Error invoking /usr/local/blast/bin/makeblastdb on file /usr/local/maker/exe/RepeatMasker/Libraries/20120418/general/at.lib. ERROR: RepeatMasker failed FATAL ERROR ERROR: Failed while doing repeat masking!! ERROR: Chunk failed at level 2 !! FAILED CONTIG:contig-dpp-500-500 --Next Contig-- Processing run.log file... MAKER WARNING: The file dpp_contig.maker.output/dpp_contig_datastore/05/1F/contig-dpp-500-500//theVoid.contig-dpp-500-500/contig-dpp-500-500.0.all.rb.out did not finish on the last run and must be erased Maker is now finished!!! Can you state to me the error and what part of the installation did I go wrong? Your help will be very much appreciated. Thank you. Attached herein are configuration files I used for MAKER. Sincerely, CJ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhall7 at hawaii.edu Fri Feb 7 17:31:36 2014 From: bhall7 at hawaii.edu (Brian Hall) Date: Fri, 07 Feb 2014 14:31:36 -1000 Subject: [maker-devel] NCBI feature table In-Reply-To: References: Message-ID: <52F57AE8.5090002@hawaii.edu> Hi Ian, My colleagues are also working on preparing a genome for submission to the NCBI. The software we are developing for this task is still a work in progress, but you are welcome to give it a try: https://github.com/tedsta/GAG It's a console-based application and it requires Python 2.6. Its strength is in filtering and modifying large segments of the genome at once -- where Apollo is good for removing a few erroneous exons, we are dealing with lists of dozens or more. This program seeks to make such changes as painless as possible. My advice is to try the simplest gff3-to-tbl script you can find and then run tbl2asn. If it works out okay, great! If you get a massive error report, get in touch and we'll help you out if we can :) --Brian On 02/07/2014 05:16 AM, maker-devel-request at yandell-lab.org wrote: > Date: Fri, 7 Feb 2014 08:29:27 -0500 > From: UMD Bioinformatics > To: maker-devel at yandell-lab.org > Subject: [maker-devel] NCBI feature table > Message-ID: <22EBA1A9-1DE2-4898-8010-4856E67F3AF8 at gmail.com> > Content-Type: text/plain; charset=windows-1252 > > Hello Maker Developers, > > I have used this software with great success and I continue to look to it going forward. However, as I?m getting ready to submit my annotations to NCBI with the genomes I haven?t found a straightforward method of turning the MAKER produced GFF files into a NCBI feature table. What is the process for creating this table? It seem that the format NCBI is looking for is unique and I haven?t uncovered any scripts or tools to assist in the creation of this table from my annotation files. If anyone has any insight on this issue it would be greatly appreciated. > > Cheers > Ian > From tmsmith23 at wisc.edu Fri Feb 7 10:48:13 2014 From: tmsmith23 at wisc.edu (Tracy Smith) Date: Fri, 7 Feb 2014 11:48:13 -0600 Subject: [maker-devel] Maker installation Message-ID: Hi, I am trying to install Maker and am running into the same problem noted on this page, namely I cannot install Apollo. https://groups.google.com/forum/#!msg/maker-devel/vrVa2mEsKbg/0e_25LvOvdEJ I tried using the new url you provided, "Here is a new location for the source --> http://sourceforge.net/code-snapshots/svn/g/gm/gmod/svn/gmod-svn-25291-apollo-trunk.zip " but that url now points nowhere. Is it possible to use WebApollo instead? Or do you know of another location where a copy of Apollo could be downloaded? Thank you so much. Best regards, Tracy -- Tracy Smith University of Wisconsin- Madison Pepperell Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 08:34:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 08:34:58 -0700 Subject: [maker-devel] MAKER presentation at PAG In-Reply-To: References: Message-ID: * * maker_map_ids - Build shorter IDs/Names for MAKER genes and transcripts following the NCBI suggested naming format. * map_fasta_ids - Maps short IDs/Names generated by maker_map_ids to MAKER fasta files. * map_gff_ids - Maps short IDs/Names generated by maker_map_id to MAKER GFF3 files, old IDs/Names are mapped to to the Alias attribute. * maker_functional_fasta - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced transcript and protein fasta files. * maker_functional_gff - Maps putative functions identified from BLASTP against UniProt/SwissProt to the MAKER produced GFF3 files in the Note attribute * ipr_update_gff - Takes InterproScan (iprscan) output and maps domain IDs and GO terms to the Dbxref and Ontology_term attributes in the GFF3 file. This is meta data that shows up when you click on an annotation in JBrowse /GBrowse. * iprscan2gff3 - Takes InerproScan (iprscan) output and generates GFF3 features representing domains. Interesting tier for GBrowse. These are visible features tracks that can be seen in JBrowse/GBrowse. Thanks, Carson From: Kevin Dorn Date: Sunday, February 9, 2014 at 9:23 PM To: Subject: MAKER presentation at PAG Hi Carson, I saw your MAKER presentation at PAG this year and have a quick question. I've used MAKER to annotate the plant genome we're working on, and am mostly done. I had to step out for a second during your talk, and when I came back, you were talking about how you can transfer meaningful annotations (getting rid of the 'ugly MAKER names' for genes). Is there an accessory script to do this? Thanks, Kevin Dorn -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitha at ccmb.res.in Mon Feb 10 00:04:37 2014 From: amitha at ccmb.res.in (AMITHA SAMPATH KUMAR) Date: Mon, 10 Feb 2014 12:34:37 +0530 (IST) Subject: [maker-devel] Falied to create new account In-Reply-To: Message-ID: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Hi, I an interested in using Maker online version, for which i tried to create a profile using the email id 'amitha at ccmb.res.in', but unfortunately, I did not successfully login. I am also pasting a link of the error here, http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. The error mentioned is: Error executing run mode 'forgot_login': Can't call method "MailMsg" without a package or object reference at /var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. at /var/www/cgi-bin/mwas/maker.cgi line 21. Kindly help me through the registration asap. Thanks Amitha. From listona at science.oregonstate.edu Sat Feb 8 19:08:42 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Sat, 08 Feb 2014 18:08:42 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training Message-ID: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron From caigh02 at gmail.com Sun Feb 9 20:26:57 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Sun, 9 Feb 2014 21:26:57 -0600 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 10 10:25:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:25:06 -0700 Subject: [maker-devel] Falied to create new account In-Reply-To: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> Message-ID: The smtp server that sends e-mails out is just down. So when you said you forgot your login, it couldn?t e-mail you. I switched to a different server for the time being. ?Carson On 2/10/14, 12:04 AM, "AMITHA SAMPATH KUMAR" wrote: >Hi, > >I an interested in using Maker online version, for which i tried to >create a profile using the email id 'amitha at ccmb.res.in', but >unfortunately, I did not successfully login. >I am also pasting a link of the error here, >http://weatherby.genetics.utah.edu/cgi-bin/mwas/maker.cgi. > >The error mentioned is: >Error executing run mode 'forgot_login': Can't call method "MailMsg" >without a package or object reference at >/var/www/cgi-bin/mwas/lib/MWAS_util.pm line 529. > at /var/www/cgi-bin/mwas/maker.cgi line 21. > >Kindly help me through the registration asap. > >Thanks >Amitha. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 10 10:26:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 10 Feb 2014 10:26:06 -0700 Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models In-Reply-To: References: Message-ID: 1. yes. It should take NCBI BLAST+ results. 2. It has to be UniProt/Swissprot or you can modify the comments of another database to look like UniProt/Swissport 3. ipr_update_gff, can also take BLAST2GO results as an undocumented feature (or at least it could last time I tested it - which was quite a long time ago). Thanks, Carson From: Guohong Cai Date: Sunday, February 9, 2014 at 8:26 PM To: Subject: [maker-devel] Fwd: Functional annotation of MAKER gene models I sent the following message to Carson but forgot to send to the maker-devel list Hi Carson, Again need your help! With your guidance, I have the gene models for my genomes. Now I am trying to assign functions to the gene models. I noticed that I can use maker_functional_gff/fasta or interproScan. I dig out some old messages in maker-devel google group, but still have a few questions: 1. Will maker_functional_gff/fasta take NCBI blastp results, or only wu-blast results? I do not have wu-blast. 2. Do I have to use Uniprot/Swiss_prot database or I can use something else? For example, may I add a few high-quality genome annotations of related species to the swiss_prot database? Or may I use Uniref90 or nr database instead of swiss_prot? 3. Do you have a script to integrate blast2go results to the maker gff/fasta? Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 12:21:31 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:21:31 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> Message-ID: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From listona at science.oregonstate.edu Mon Feb 10 12:46:06 2014 From: listona at science.oregonstate.edu (Aaron Liston) Date: Mon, 10 Feb 2014 11:46:06 -0800 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> Message-ID: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron From: Barry Moore [mailto:barry.utah at gmail.com] Sent: Monday, February 10, 2014 11:22 AM To: Aaron Liston Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Re-using repeat masking in SNAP training Hi Arron, If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? Barry On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Mon Feb 10 12:56:26 2014 From: barry.utah at gmail.com (Barry Moore) Date: Mon, 10 Feb 2014 12:56:26 -0700 Subject: [maker-devel] Re-using repeat masking in SNAP training In-Reply-To: <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> References: <20140208180842.14348ulagb3squ5c@webmail.oregonstate.edu> <78D5D862-1758-4035-A58C-3E4BCC6382A7@genetics.utah.edu> <02b401cf2698$bd2a1550$377e3ff0$@science.oregonstate.edu> Message-ID: <19FC4633-46F6-4B32-820A-A68C242A1E77@gmail.com> Yep. If you want to keep the results from each step just copy the GFF3 file from your first run to a new name and then redo your run. B On Feb 10, 2014, at 12:46 PM, Aaron Liston wrote: > Hi Barry: I changed the name of the genome file, so that I could see the results at each step. However, it sounds like if I had kept the same name, MAKER would use the info from the previous run. Is that correct? Aaron > > From: Barry Moore [mailto:barry.utah at gmail.com] > Sent: Monday, February 10, 2014 11:22 AM > To: Aaron Liston > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Re-using repeat masking in SNAP training > > Hi Arron, > > If you re-run maker and don't change the details about the repeat library (i.e. you only update the SNAP HMM file) then MAKER shouldn't redo any work with repeat masking it should reuse the work it has already done. Is this not what you are seeing? > > Barry > > > On Feb 8, 2014, at 7:08 PM, Aaron Liston wrote: > > > I am following the tutorial for training SNAP, and it works fine. However, the tutorial instructions have MAKER repeat the repeat masking. To avoid this, I concatenated my gff files from the first round of annotation and used maker_gff=round1.gff and rm_pass=1 but at the end of the process, the repeat annotations were not there. Any suggestions? Thanks, Aaron > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 11:37:36 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 18:37:36 +0000 Subject: [maker-devel] Falied to create new account In-Reply-To: References: <11349995-a97a-43fd-9fd6-420dd067cd6b@node1> , Message-ID: Hossein, Ok. So since this error came up on a local install, I'm going to need some more information to understand what went wrong. Is it the same contig that always causes this error? If it is, then is the the only error or warning that MAKER encounters while running on this contig? Or, if multiple contigs fail, then is it always the same error? If you can narrow it down to the smallest possible dataset that consistently gives the same error, then we canb egin to understand what's wrong. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 11:20 AM To: Daniel Ence Subject: Re: [maker-devel] Falied to create new account Hi Daniel I running it through the local server at my work M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-11 12:16 PM, "Daniel Ence" wrote: >Hi Hossein, > >Did you encounter this error while you were running MAKER on your local >machine or through the MAKER web annotation service? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Carson Holt [carsonhh at gmail.com] >Sent: Tuesday, February 11, 2014 10:18 AM >To: Daniel Ence >Cc: Mark Yandell >Subject: FW: [maker-devel] Falied to create new account > >Hey Daniel could you download his dataset, and see if you can replicate >the error. Also check if this was an MWAS job or a local maker run (his >dataset will already be there for MWAS, you just need the job ID). > >Thanks, >Carson > >On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: > >>Hi Carson >> >> >>I encountered this error while running maker >> >>FATAL ERROR >>ERROR: Failed while processing the chunk divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00006 >> >> >> >> >> >>HB >> >> >> >> >> >> >> >>> >> > > From darasappan at gmail.com Tue Feb 11 11:48:23 2014 From: darasappan at gmail.com (dhivya arasappan) Date: Tue, 11 Feb 2014 12:48:23 -0600 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> Message-ID: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join > the GFF3?s without stripping out the fasta sequence at the end when > training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by > the mailing list. > > Here is the whole file: > > > cat.formatted.gff.tgz > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt > wrote: >> Could you give me the file without using 'head? to trim it, its >> cutting it before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org >> " >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file >> and used that as my input. I will rerun snap with the gff file >> containing the sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to >> maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip >>> the fasta sequence from the GFF3 you are using as input to >>> maker2zff? There should be fasta sequence at the end of that >>> file. Also can I see the GFF3 file you are using as input to >>> maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org >>> " >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has >>> data in it. However, the SNAP steps after that have created empty >>> files. The following are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get >>> errors like this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds >>> exon-5:out_of_bounds exon-4:out_of_bounds exon-3:out_of_bounds >>> exon-2:out_of_bounds exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds >>> exon-2:out_of_bounds exon-3:out_of_bounds exon-4:out_of_bounds >>> exon-5:out_of_bounds exon-6:out_of_bounds exon-7:out_of_bounds >>> exon-8:out_of_bounds exon-9:out_of_bounds exon-10:out_of_bounds >>> exon-11:out_of_bounds exon-12:out_of_bounds exon-13:out_of_bounds >>> exon-14:out_of_bounds exon-15:out_of_bounds exon-16:out_of_bounds >>> exon-17:out_of_bounds exon-18:out_of_bounds exon-19:out_of_bounds >>> exon-20:out_of_bounds exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all >>> showing up as errors. I realize this may be an issue with snap, >>> but are you familiar with anything like this? My genome.ann file >>> is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step >>>> 3? We?ve had a couple of recent posts where after training snap >>>> was giving no results, and as a result maker couldn?t give any >>>> genes. One cause of something like that may be your step 2. >>>> Make sure the ZFF wasn?t empty you used to train with. The >>>> maker2zff script uses filters to only put the best genes in the >>>> off file, and if all your genes fail the filtering then you are >>>> training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your >>>> protein file. I see that you protein marches are varying wildly >>>> from run to run? So is your contig count? Were the subset of >>>> contigs you have results for long enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used >>>> trinotate to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of >>>> which there are 29,000 transcripts). I used the protein >>>> sequences from trinity assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output >>>> looks like this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than >>>> expected, so I went the cufflinks route. I dont understand why >>>> when using the cufflinks transcripts, even less genes are being >>>> found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train >>>> SNAP. I then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/ >>>> maker_mpi_withAlltrinity/snap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the >>>> genome was more than 90% complete. Any suggestions would be >>>> helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand >>>>> what's happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? >>>>> Also, you had 29,000 transcripts from cufflinks, but fewer from >>>>> MAKER when you gave it the cufflinks data. How many transcripts >>>>> did MAKER identify with the cufflinks data? Did you still get >>>>> more than the 10,000 transcripts that you found with just the >>>>> Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might >>>>> be affecting it's performance is that it only annotates a gene >>>>> where there is both evidence (like your RNA-seq data) and an ab- >>>>> initio prediction. If a prediction is unsupported by the >>>>> evidence, then MAKER won't annotate a gene and if evidence >>>>> aligns where there's no prediction, MAKER won't annotate a gene >>>>> either. What ab-initio predictors are you using and have they >>>>> been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments >>>>> to a gene model by setting the est2genome option to 1, but that >>>>> will usually give you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks >>>>> data and with est2genome set to 1, and let us know how that >>>>> affects the MAKER results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on >>>>> behalf of dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a >>>>> very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran >>>>> maker >>>>> using my genome assembly and the trinity results. I did not get >>>>> as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome >>>>> assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker >>>>> using my >>>>> genome assembly and the cufflinks result. I get much less >>>>> number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel >>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 11 11:55:38 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 11 Feb 2014 11:55:38 -0700 Subject: [maker-devel] maker annotation with cufflinks output In-Reply-To: <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> References: <516896DE-1ACB-460B-9842-90E12CB72343@gmail.com> <0DF852EE-252D-471C-9AC4-B073E6B2DA58@gmail.com> <02F007BA-3FEA-4C85-8F7A-D177058BFF35@gmail.com> <0BB3E178-1CA3-46E7-8923-3E7C6B834665@gmail.com> Message-ID: I wouldn?t use mpi_evaluator. It is buggy and has virtually no documentation. The AED values are the best way to identify which genes are higher and lower quality. You can also run interproscan to identify protein domain content as an independent evaluation. Look at this paper here ?> http://www.biomedcentral.com/1471-2105/12/491 Figure 4 has a nice example of how AED, domain content, and gene orthology correlate to show the quality of different subsets of genes in seven ant genomes. If you choose to try mpi_evaluator it uses the -CTL option to generate empty files that you then fill in. Thanks, Carson From: dhivya arasappan Date: Tuesday, February 11, 2014 at 11:48 AM To: Carson Holt Cc: Daniel Ence , Subject: Re: [maker-devel] maker annotation with cufflinks output With your suggested changes (using a protein file not derived from the RNA-seq data and fixing the gff file for training SNAP), I was able to increase the number of genes from 6000+ to 18116. I'm now trying to evaluate the quality of the annotation. I have a question about the usage for mpi_evaluator. In the maker tutorial, the usage is given as: mpi_evaluator [options] What files are being referred to in the input parameters: eval_opts, eval_bopts and eval_exe? Thanks Dhivya On Feb 6, 2014, at 11:47 AM, Carson Holt wrote: > Ok. Content looks good. Just make sure to use gff3_merge to join the GFF3?s > without stripping out the fasta sequence at the end when training SNAP. > > Thanks, > Carson > > > From: dhivya arasappan > Date: Thursday, February 6, 2014 at 10:29 AM > To: Carson Holt > Cc: Daniel Ence > Subject: Re: [maker-devel] maker annotation with cufflinks output > > Sorry I was just trying to make it small enough to be approved by the mailing > list. > > Here is the whole file: > > > cat.formatted.gff.tgz > b> > > > > On Thu, Feb 6, 2014 at 11:04 AM, Carson Holt wrote: >> Could you give me the file without using 'head? to trim it, its cutting it >> before it reaches the part I?m interested in. >> >> ?Carson >> >> >> From: dhivya arasappan >> Date: Thursday, February 6, 2014 at 10:01 AM >> >> To: Carson Holt >> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >> >> Subject: Re: [maker-devel] maker annotation with cufflinks output >> >> Oh yes I did- I took just the non sequence entries in the gff file and used >> that as my input. I will rerun snap with the gff file containing the >> sequences as well. >> >> I'm attaching a snippet of the gff file that I used as input to maker2zff. >> >> Thanks for your help >> Dhivya >> >> >> >> >> On Feb 6, 2014, at 10:05 AM, Carson Holt wrote: >> >>> Your genome.dna file has no sequence? Did you by any chance strip the fasta >>> sequence from the GFF3 you are using as input to maker2zff? There should be >>> fasta sequence at the end of that file. Also can I see the GFF3 file you >>> are using as input to maker2zff. >>> >>> Thanks, >>> Carson >>> >>> From: dhivya arasappan >>> Date: Thursday, February 6, 2014 at 7:47 AM >>> To: Carson Holt >>> Cc: Daniel Ence , "maker-devel at yandell-lab.org" >>> >>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>> >>> Hello, >>> >>> I does appear than my genome.ann file from maker2zff script has data in it. >>> However, the SNAP steps after that have created empty files. The following >>> are all empty: >>> >>> alt.dna err.dna export.dna genome.dna olp.dna uni.dna wrn.dna >>> alt.ann err.ann export.ann genome.ann olp.ann uni.ann wrn.ann >>> >>> When I tried to get gene stats or validate genome.ann, I get errors like >>> this for all of them: >>> >>> fathom genome.ann genome.dna -gene-stats |more >>> MODEL5547 1 1 6 + errors(6): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds >>> MODEL5568 1 1 6 - errors(6): exon-6:out_of_bounds exon-5:out_of_bounds >>> exon-4:out_of_bounds exon-3:out_of_bounds exon-2:out_of_bounds >>> exon-1:out_of_bounds >>> MODEL5589 1 1 5 + errors(5): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> MODEL5195 1 1 21 + errors(21): exon-1:out_of_bounds exon-2:out_of_bounds >>> exon-3:out_of_bounds exon-4:out_of_bounds exon-5:out_of_bounds >>> exon-6:out_of_bounds exon-7:out_of_bounds exon-8:out_of_bounds >>> exon-9:out_of_bounds exon-10:out_of_bounds exon-11:out_of_bounds >>> exon-12:out_of_bounds exon-13:out_of_bounds exon-14:out_of_bounds >>> exon-15:out_of_bounds exon-16:out_of_bounds exon-17:out_of_bounds >>> exon-18:out_of_bounds exon-19:out_of_bounds exon-20:out_of_bounds >>> exon-21:out_of_bounds >>> >>> I'm not sure why the annotation I'm seeing in genome.ann are all showing up >>> as errors. I realize this may be an issue with snap, but are you familiar >>> with anything like this? My genome.ann file is attached for reference. >>> >>> Thanks >>> Dhivya >>> >>> On Feb 5, 2014, at 12:38 PM, Carson Holt wrote: >>> >>>> Do you have any features of type snap in your results from step 3? We?ve >>>> had a couple of recent posts where after training snap was giving no >>>> results, and as a result maker couldn?t give any genes. One cause of >>>> something like that may be your step 2. Make sure the ZFF wasn?t empty you >>>> used to train with. The maker2zff script uses filters to only put the best >>>> genes in the off file, and if all your genes fail the filtering then you >>>> are training with an empty ZFF. >>>> >>>> Also you should use proteins from a related species as your protein file. >>>> I see that you protein marches are varying wildly from run to run? So is >>>> your contig count? Were the subset of contigs you have results for long >>>> enough to contain genes? >>>> >>>> ?Carson >>>> >>>> From: dhivya arasappan >>>> Date: Monday, February 3, 2014 at 9:31 AM >>>> To: Daniel Ence >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] maker annotation with cufflinks output >>>> >>>> Hi Daniel, >>>> >>>> I was able to check on some of those questions. >>>> >>>> 1. From trinity assembly: I started with 102000 contigs. I used trinotate >>>> to annotate proteins in this. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this (most important parts on top): >>>> >>>> 6653 gene >>>> 46675 exon >>>> 280534 protein_match >>>> 59934 CDS >>>> 969 contig >>>> 105388 expressed_sequence_match >>>> 12584 five_prime_UTR >>>> 78565 match >>>> 1401369 match_part >>>> 10180 mRNA >>>> 11545 three_prime_UTR >>>> >>>> 2. From cufflinks assembly: I started with 133380 entries (out of which >>>> there are 29,000 transcripts). I used the protein sequences from trinity >>>> assembly. >>>> >>>> I ran maker on this data with est2genome set to 1. The output looks like >>>> this: >>>> 29 gene >>>> 75 exon >>>> 573659 protein_match >>>> 67 CDS >>>> 1099 contig >>>> 269298 expressed_sequence_match >>>> 23 five_prime_UTR >>>> 173844 match >>>> 2221846 match_part >>>> 29 mRNA >>>> 23 three_prime_UTR >>>> >>>> The genes annotated using the trinity assembly is lower than expected, so I >>>> went the cufflinks route. I dont understand why when using the cufflinks >>>> transcripts, even less genes are being found. >>>> >>>> 3. Training SNAP: I used the results of maker from 1 to train SNAP. I >>>> then used that training set to rerun maker: >>>> snaphmm=/scratch/01184/daras/jansen/RHA/allpaths/maker_mpi_withAlltrinity/s >>>> nap/RHA.hmm >>>> est2genome=0 >>>> >>>> And again I got results with no entries for gene, exon, CDS etc. >>>> 957 contig >>>> 46555 expressed_sequence_match >>>> 43651 match >>>> 553633 match_part >>>> 113738 protein_match >>>> >>>> As I mentioned in another email, cegma results indicated that the genome >>>> was more than 90% complete. Any suggestions would be helpful. >>>> >>>> Thank you >>>> Dhivya >>>> >>>> >>>> >>>> >>>> On Jan 30, 2014, at 2:51 PM, Daniel Ence wrote: >>>> >>>>> Hi Dhivya, >>>>> >>>>> I think there a few numbers that could be helpful to understand what's >>>>> happening here. >>>>> >>>>> How many transcripts did Trinity assembly the RNA-seq data into? Also, you >>>>> had 29,000 transcripts from cufflinks, but fewer from MAKER when you gave >>>>> it the cufflinks data. How many transcripts did MAKER identify with the >>>>> cufflinks data? Did you still get more than the 10,000 transcripts that >>>>> you found with just the Trinity data? >>>>> >>>>> A key part of MAKER's approach to genome annotation that might be >>>>> affecting it's performance is that it only annotates a gene where there is >>>>> both evidence (like your RNA-seq data) and an ab-initio prediction. If a >>>>> prediction is unsupported by the evidence, then MAKER won't annotate a >>>>> gene and if evidence aligns where there's no prediction, MAKER won't >>>>> annotate a gene either. What ab-initio predictors are you using and have >>>>> they been trained specific genome? >>>>> >>>>> You can force MAKER to automatically promote evidence alignments to a gene >>>>> model by setting the est2genome option to 1, but that will usually give >>>>> you many false positives. >>>>> >>>>> Try rerunning it with either the Trinity data or the Cufflinks data and >>>>> with est2genome set to 1, and let us know how that affects the MAKER >>>>> results. >>>>> >>>>> Thanks, >>>>> Daniel >>>>> >>>>> Daniel Ence >>>>> Graduate Student >>>>> Eccles Institute of Human Genetics >>>>> University of Utah >>>>> 15 North 2030 East, Room 2100 >>>>> Salt Lake City, UT 84112-5330 >>>>> ________________________________________ >>>>> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>>>> dhivya arasappan [darasappan at gmail.com] >>>>> Sent: Thursday, January 30, 2014 11:18 AM >>>>> To: maker-devel at yandell-lab.org >>>>> Subject: [maker-devel] maker annotation with cufflinks output >>>>> >>>>> Hello, >>>>> >>>>> I am trying to annotate a 200 mb plant genome for which I have a very >>>>> good assembly. >>>>> >>>>> I tried to denovo assemble RNA-seq data using trinity and ran maker >>>>> using my genome assembly and the trinity results. I did not get as >>>>> many transcripts as expected, around 10,000 transcripts. >>>>> >>>>> So, I decided to try a different approach. I did a genome assisted >>>>> assembly of the RNA-seq data using tophat/cufflinks. This pipeline >>>>> generated 21,000 genes, 29,000 transcripts. I then ran maker using my >>>>> genome assembly and the cufflinks result. I get much less number of >>>>> transcripts as a result. >>>>> >>>>> If cufflinks found 29000 transcripts by mapping to the genome, I'm >>>>> confused as to why maker is not finding the same. >>>>> >>>>> Any suggestions would be appreciated. >>>>> >>>>> Thanks >>>>> Dhivya >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 13:52:05 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 20:52:05 +0000 Subject: [maker-devel] New MAKER release Message-ID: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Feb 11 14:19:17 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 11 Feb 2014 21:19:17 +0000 Subject: [maker-devel] New MAKER release In-Reply-To: References: Message-ID: URLs can be manually edited in the .../maker/src/locations file. I?ve also updated that file in the latest MAKER download. to point to the new RepBase URL. Thanks, Carson From: Joanna Kelley > Date: Tuesday, February 11, 2014 at 2:00 PM To: Carson Holt > Subject: Re: [maker-devel] New MAKER release Hi Carson, The RepBase step is failing, it seems to be looking for the incorrect version, where do I change the code to solve that? Thanks, Joanna Downloading RepBase... --2014-02-11 12:59:38-- http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/repeatmaskerlibraries-20130422.tar.gz Resolving www.girinst.org... 66.201.49.247 Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 401 Authorization Required Connecting to www.girinst.org|66.201.49.247|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2014-02-11 12:59:38 ERROR 404: Not Found. ERROR: Failed installing RepBase, now cleaning installation path... You may need to install RepBase manually. On Tue, Feb 11, 2014 at 12:52 PM, Carson Holt > wrote: Hello all, MAKER has been updated to 2.31. There are no major new features over 2.30. It is primarily just bug fixes, and updates to the features that were added from MAKER-P like tRNAscan support. I also was able to remove the seg faults that sometimes happened on exit under OpenMPI. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Please update your address book, my new email address is joanna.l.kelley at wsu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 11 15:59:57 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 11 Feb 2014 22:59:57 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: Message-ID: Hi Hossen, I think that what would be the most help right now is if you ran MAKER on only one of those contigs that are failing and send me the entire error output along with the maker control files that you are using. It looks like the error is coming from the gff3 files that you are using as input. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Tuesday, February 11, 2014 3:51 PM To: Daniel Ence Subject: ERROR: Failed while processing the chunk divide!! Dear Daniel I re-started maker and it is still running. But in error our file that has been generated so far it seems that smaller conitgs are affected. There are contigs of 2-4 kb with this error but also I noticed a contig of 30kb length having this error I was wondering if I need to change the setting in the maker_opt file #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) If I understand correctly max_dna_len divide conitgs of over 100kb to smaller chucks. However it is not clear to me that for the min_contig option if the default contig length is 10kb or less, then why I have error message for 30kb long contigs. Should I change this to 0 Here is an example of the error message for one of the contigs #--------- command -------------# Widget::exonerate::est2genome: /usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 /17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta -t /raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.brass icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136. fasta -Q dna -T dna --model est2genome --minintron 20 --showcigar --percent 20 > /raid01/projects/Plasmodiophora/brassica e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brassi cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3S c00001.235-1136.comp14545_c0_seq1.est_exonerate #-------------------------------# cleaning blastn... cleaning tblastx... cleaning blastx... ERROR: Failed on PbPT3Sc00001_S_0.8_1-mRNA-1 Check your input GFF3 file for errors! (from GFFDB) FATAL ERROR ERROR: Failed while processing the chunk divide!! ERROR: Chunk failed at level 17 !! FAILED CONTIG:PbPT3Sc00001 --Next Contig-- Regards HB On 14-02-11 12:37 PM, "Daniel Ence" wrote: >Hossein, > >Ok. So since this error came up on a local install, I'm going to need >some more information to understand what went wrong. Is it the same >contig that always causes this error? If it is, then is the the only >error or warning that MAKER encounters while running on this contig? Or, >if multiple contigs fail, then is it always the same error? > >If you can narrow it down to the smallest possible dataset that >consistently gives the same error, then we canb egin to understand what's >wrong. > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 11:20 AM >To: Daniel Ence >Subject: Re: [maker-devel] Falied to create new account > >Hi Daniel > >I running it through the local server at my work > > > > > > >M. Hossein Borhan, Ph.D. >Research Scientist/ Chercheur Scientifique >Saskatoon Research Centre/Centre de Recherches de Saskatoon >Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >107 Science Place, Saskatoon, SK.,S7N 0X2 >Telephone/T?l?phone: (306) 385-9441 >Facsimile/T?l?copieur: (306) 385-9482 >Hossein.borhan at agr.gc.ca > > > > > > > > >On 14-02-11 12:16 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>Did you encounter this error while you were running MAKER on your local >>machine or through the MAKER web annotation service? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Carson Holt [carsonhh at gmail.com] >>Sent: Tuesday, February 11, 2014 10:18 AM >>To: Daniel Ence >>Cc: Mark Yandell >>Subject: FW: [maker-devel] Falied to create new account >> >>Hey Daniel could you download his dataset, and see if you can replicate >>the error. Also check if this was an MWAS job or a local maker run (his >>dataset will already be there for MWAS, you just need the job ID). >> >>Thanks, >>Carson >> >>On 2/11/14, 10:16 AM, "Borhan, Hossein" wrote: >> >>>Hi Carson >>> >>> >>>I encountered this error while running maker >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00006 >>> >>> >>> >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>>> >>> >> >> > From marc.hoeppner at imbim.uu.se Wed Feb 12 01:34:12 2014 From: marc.hoeppner at imbim.uu.se (Marc P. Hoeppner) Date: Wed, 12 Feb 2014 09:34:12 +0100 Subject: [maker-devel] Annotations from protein alignments Message-ID: <52FB3204.60606@imbim.uu.se> Dear list, I have an annotation project with both protein data (it's a bird, so I've been using both vertebrates in general and chicken in specific), and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq data. The chicken augustus model seems to do a decent job in seeding gene loci, but it's not quite perfect. I want to use protein alignments to create a high-confidence set of exons and subsequently a set of gene loci to train e.g. snap), but when testing to set protein2genome=1 I never get any annotations. This is also true for the test data set that is delivered together with Maker (hsap_). Anything I should know about the use of proteins to generate annotations? I left all settings in the config file at their defaults (except protein2genome=1). I've tried this with both Maker 2.30 and 2.31. All the best, Marc -- ----------- Marc P. Hoeppner, PhD Group leader BILS Genome annotation platform Department of Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoepner at imbim.uu.se From carsonhh at gmail.com Wed Feb 12 08:42:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 12 Feb 2014 08:42:36 -0700 Subject: [maker-devel] Annotations from protein alignments In-Reply-To: <52FB3204.60606@imbim.uu.se> References: <52FB3204.60606@imbim.uu.se> Message-ID: I updated the 2.31 tar ball. Go ahead and download it again. protein2genome was turned off for eukaryotes and only working for prokaryotic genomes. ?Carson On 2/12/14, 1:34 AM, "Marc P. Hoeppner" wrote: >Dear list, > >I have an annotation project with both protein data (it's a bird, so >I've been using both vertebrates in general and chicken in specific), >and huge amounts of somewhat dodgy (as in lot's of pre-mRNA) RNA-seq >data. The chicken augustus model seems to do a decent job in seeding >gene loci, but it's not quite perfect. I want to use protein alignments >to create a high-confidence set of exons and subsequently a set of gene >loci to train e.g. snap), but when testing to set protein2genome=1 I >never get any annotations. This is also true for the test data set that >is delivered together with Maker (hsap_). Anything I should know about >the use of proteins to generate annotations? I left all settings in the >config file at their defaults (except protein2genome=1). I've tried this >with both Maker 2.30 and 2.31. > >All the best, > >Marc > >-- >----------- >Marc P. Hoeppner, PhD >Group leader >BILS Genome annotation platform > >Department of Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoepner at imbim.uu.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 11:59:11 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 18:59:11 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > From dence at genetics.utah.edu Wed Feb 12 12:15:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 19:15:59 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , , Message-ID: Hi Hossein, One more question. How did you make the gff3 that you're passing through here? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Daniel Ence [dence at genetics.utah.edu] Sent: Wednesday, February 12, 2014 11:59 AM To: Borhan, Hossein Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] ERROR: Failed while processing the chunk divide!! Hi Hossein, So, after looking at the gff3 and your control files, I had an idea. There's the part of the control file called "Re-annotation Using MAKER Derived GFF3", but you can also passthrough features from a gff3 using the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. Sometimes we encounter problems with the MAKER passthrough. Could you try dividing the gff3 file into the different feature sources and passing it through the "est_gff" etc options and not with the MAKER passthrough? That will tell us if the problem is with the gff3 file or with how MAKER is processing it. Another also to check is to make sure that the contig names in the gff3 file match the contig names in the fasta file that you're annotating. Thanks, Daniel Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 8:49 AM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I have generated the files that you requested. I choose Sc00009 from my genome which is 30 kb and was one of the scaffolds coming up with error. In addition to Ctl files and error output file I also attached a part of the gff file related to SC00009 that is indicated in the error message. Thanks for helping with this Regards HB On 14-02-11 4:59 PM, "Daniel Ence" wrote: >Hi Hossen, > >I think that what would be the most help right now is if you ran MAKER on >only one of those contigs that are failing and send me the entire error >output along with the maker control files that you are using. It looks >like the error is coming from the gff3 files that you are using as input. > >Thanks, >Daniel > > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Tuesday, February 11, 2014 3:51 PM >To: Daniel Ence >Subject: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > >I re-started maker and it is still running. But in error our file that has >been generated so far it seems that smaller conitgs are affected. There >are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >length having this error > >I was wondering if I need to change the setting in the maker_opt file > >#-----MAKER Behavior Options >max_dna_len=100000 #length for dividing up contigs into chunks >(increases/decreases memory usage) >min_contig=1 #skip genome contigs below this length (under 10kb are often >useless) > > >If I understand correctly max_dna_len divide conitgs of over 100kb to >smaller chucks. However it is not clear to me that for the min_contig >option if the default contig length is 10kb or less, then why I have error >message for 30kb long contigs. Should I change this to 0 > >Here is an example of the error message for one of the contigs > > >#--------- command -------------# >Widget::exonerate::est2genome: >/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >-t >/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bras >s >icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-1136 >. >fasta >-Q dna -T dna --model est2genome >--minintron 20 --showcigar --percent 20 > >/raid01/projects/Plasmodiophora/brassica >e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.brass >i >cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3 >S >c00001.235-1136.comp14545_c0_seq1.est_exonerate >#-------------------------------# >cleaning blastn... >cleaning tblastx... >cleaning blastx... >ERROR: Failed on >PbPT3Sc00001_S_0.8_1-mRNA-1 >Check your input GFF3 file for errors! >(from GFFDB) > >FATAL ERROR >ERROR: Failed while processing the chunk >divide!! > >ERROR: Chunk failed at level 17 >!! >FAILED CONTIG:PbPT3Sc00001 > > > > >--Next Contig-- > > > > > > >Regards > > >HB > > > > > > > > > > >On 14-02-11 12:37 PM, "Daniel Ence" wrote: > >>Hossein, >> >>Ok. So since this error came up on a local install, I'm going to need >>some more information to understand what went wrong. Is it the same >>contig that always causes this error? If it is, then is the the only >>error or warning that MAKER encounters while running on this contig? Or, >>if multiple contigs fail, then is it always the same error? >> >>If you can narrow it down to the smallest possible dataset that >>consistently gives the same error, then we canb egin to understand what's >>wrong. >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 11:20 AM >>To: Daniel Ence >>Subject: Re: [maker-devel] Falied to create new account >> >>Hi Daniel >> >>I running it through the local server at my work >> >> >> >> >> >> >>M. Hossein Borhan, Ph.D. >>Research Scientist/ Chercheur Scientifique >>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>107 Science Place, Saskatoon, SK.,S7N 0X2 >>Telephone/T?l?phone: (306) 385-9441 >>Facsimile/T?l?copieur: (306) 385-9482 >>Hossein.borhan at agr.gc.ca >> >> >> >> >> >> >> >> >>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >> >>>Hi Hossein, >>> >>>Did you encounter this error while you were running MAKER on your local >>>machine or through the MAKER web annotation service? >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Carson Holt [carsonhh at gmail.com] >>>Sent: Tuesday, February 11, 2014 10:18 AM >>>To: Daniel Ence >>>Cc: Mark Yandell >>>Subject: FW: [maker-devel] Falied to create new account >>> >>>Hey Daniel could you download his dataset, and see if you can replicate >>>the error. Also check if this was an MWAS job or a local maker run (his >>>dataset will already be there for MWAS, you just need the job ID). >>> >>>Thanks, >>>Carson >>> >>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>wrote: >>> >>>>Hi Carson >>>> >>>> >>>>I encountered this error while running maker >>>> >>>>FATAL ERROR >>>>ERROR: Failed while processing the chunk divide!! >>>> >>>>ERROR: Chunk failed at level 17 >>>>!! >>>>FAILED CONTIG:PbPT3Sc00006 >>>> >>>> >>>> >>>> >>>> >>>>HB >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>> >>> >>> >> > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Feb 12 13:42:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 12 Feb 2014 20:42:03 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, those problems with passing through MAKER-derived gff3 have been addressed in newer versions of MAKER. The current version is 2.31 and is available for download now on our website. Try installing that version and trying the same controls file you started out using, and let me know if that fixes the problems. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Wednesday, February 12, 2014 12:55 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Hi Daniel I am using maker 2.10 I also checked the naming of the scaffold in the genome file and the gff file for the failed example. Naming is the same Thanks Hossein M. Hossein Borhan, Ph.D. Research Scientist/ Chercheur Scientifique Saskatoon Research Centre/Centre de Recherches de Saskatoon Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada 107 Science Place, Saskatoon, SK.,S7N 0X2 Telephone/T?l?phone: (306) 385-9441 Facsimile/T?l?copieur: (306) 385-9482 Hossein.borhan at agr.gc.ca On 14-02-12 1:30 PM, "Daniel Ence" wrote: >Hi Hossein, > >And which version of MAKER are you using? > >Thanks, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 12:25 PM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Hi Daniel > >Gff file was generated by the 1st run of maker > > > >HB > > > > > > > >On 14-02-12 1:15 PM, "Daniel Ence" wrote: > >>Hi Hossein, >> >>One more question. How did you make the gff3 that you're passing through >>here? >> >>Thanks, >>Daniel >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Daniel Ence [dence at genetics.utah.edu] >>Sent: Wednesday, February 12, 2014 11:59 AM >>To: Borhan, Hossein >>Cc: maker-devel at yandell-lab.org >>Subject: Re: [maker-devel] ERROR: Failed while processing the chunk >>divide!! >> >>Hi Hossein, >> >>So, after looking at the gff3 and your control files, I had an idea. >>There's the part of the control file called "Re-annotation Using MAKER >>Derived GFF3", but you can also passthrough features from a gff3 using >>the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. >> >>Sometimes we encounter problems with the MAKER passthrough. Could you try >>dividing the gff3 file into the different feature sources and passing it >>through the "est_gff" etc options and not with the MAKER passthrough? >>That will tell us if the problem is with the gff3 file or with how MAKER >>is processing it. >> >>Another also to check is to make sure that the contig names in the gff3 >>file match the contig names in the fasta file that you're annotating. >> >>Thanks, >>Daniel >> >> >> >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Wednesday, February 12, 2014 8:49 AM >>To: Daniel Ence >>Subject: Re: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >> >>I have generated the files that you requested. I choose Sc00009 from my >>genome which is 30 kb and was one of the scaffolds coming up with error. >>In addition to Ctl files and error output file I also attached a part of >>the gff file related to SC00009 that is indicated in the error message. >> >> >>Thanks for helping with this >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 4:59 PM, "Daniel Ence" wrote: >> >>>Hi Hossen, >>> >>>I think that what would be the most help right now is if you ran MAKER >>>on >>>only one of those contigs that are failing and send me the entire error >>>output along with the maker control files that you are using. It looks >>>like the error is coming from the gff3 files that you are using as >>>input. >>> >>>Thanks, >>>Daniel >>> >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 3:51 PM >>>To: Daniel Ence >>>Subject: ERROR: Failed while processing the chunk divide!! >>> >>>Dear Daniel >>> >>>I re-started maker and it is still running. But in error our file that >>>has >>>been generated so far it seems that smaller conitgs are affected. There >>>are contigs of 2-4 kb with this error but also I noticed a contig of >>>30kb >>>length having this error >>> >>>I was wondering if I need to change the setting in the maker_opt file >>> >>>#-----MAKER Behavior Options >>>max_dna_len=100000 #length for dividing up contigs into chunks >>>(increases/decreases memory usage) >>>min_contig=1 #skip genome contigs below this length (under 10kb are >>>often >>>useless) >>> >>> >>>If I understand correctly max_dna_len divide conitgs of over 100kb to >>>smaller chucks. However it is not clear to me that for the min_contig >>>option if the default contig length is 10kb or less, then why I have >>>error >>>message for 30kb long contigs. Should I change this to 0 >>> >>>Here is an example of the error message for one of the contigs >>> >>> >>>#--------- command -------------# >>>Widget::exonerate::est2genome: >>>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>>-t >>>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.br >>>a >>>s >>>s >>>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-11 >>>3 >>>6 >>>. >>>fasta >>>-Q dna -T dna --model est2genome >>>--minintron 20 --showcigar --percent 20 > >>>/raid01/projects/Plasmodiophora/brassica >>>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bra >>>s >>>s >>>i >>>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbP >>>T >>>3 >>>S >>>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>>#-------------------------------# >>>cleaning blastn... >>>cleaning tblastx... >>>cleaning blastx... >>>ERROR: Failed on >>>PbPT3Sc00001_S_0.8_1-mRNA-1 >>>Check your input GFF3 file for errors! >>>(from GFFDB) >>> >>>FATAL ERROR >>>ERROR: Failed while processing the chunk >>>divide!! >>> >>>ERROR: Chunk failed at level 17 >>>!! >>>FAILED CONTIG:PbPT3Sc00001 >>> >>> >>> >>> >>>--Next Contig-- >>> >>> >>> >>> >>> >>> >>>Regards >>> >>> >>>HB >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >>> >>>>Hossein, >>>> >>>>Ok. So since this error came up on a local install, I'm going to need >>>>some more information to understand what went wrong. Is it the same >>>>contig that always causes this error? If it is, then is the the only >>>>error or warning that MAKER encounters while running on this contig? >>>>Or, >>>>if multiple contigs fail, then is it always the same error? >>>> >>>>If you can narrow it down to the smallest possible dataset that >>>>consistently gives the same error, then we canb egin to understand >>>>what's >>>>wrong. >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>>Sent: Tuesday, February 11, 2014 11:20 AM >>>>To: Daniel Ence >>>>Subject: Re: [maker-devel] Falied to create new account >>>> >>>>Hi Daniel >>>> >>>>I running it through the local server at my work >>>> >>>> >>>> >>>> >>>> >>>> >>>>M. Hossein Borhan, Ph.D. >>>>Research Scientist/ Chercheur Scientifique >>>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>>Telephone/T?l?phone: (306) 385-9441 >>>>Facsimile/T?l?copieur: (306) 385-9482 >>>>Hossein.borhan at agr.gc.ca >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>>> >>>>>Hi Hossein, >>>>> >>>>>Did you encounter this error while you were running MAKER on your >>>>>local >>>>>machine or through the MAKER web annotation service? >>>>> >>>>>Thanks, >>>>>Daniel >>>>> >>>>> >>>>>Daniel Ence >>>>>Graduate Student >>>>>Eccles Institute of Human Genetics >>>>>University of Utah >>>>>15 North 2030 East, Room 2100 >>>>>Salt Lake City, UT 84112-5330 >>>>>________________________________________ >>>>>From: Carson Holt [carsonhh at gmail.com] >>>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>>To: Daniel Ence >>>>>Cc: Mark Yandell >>>>>Subject: FW: [maker-devel] Falied to create new account >>>>> >>>>>Hey Daniel could you download his dataset, and see if you can >>>>>replicate >>>>>the error. Also check if this was an MWAS job or a local maker run >>>>>(his >>>>>dataset will already be there for MWAS, you just need the job ID). >>>>> >>>>>Thanks, >>>>>Carson >>>>> >>>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>>wrote: >>>>> >>>>>>Hi Carson >>>>>> >>>>>> >>>>>>I encountered this error while running maker >>>>>> >>>>>>FATAL ERROR >>>>>>ERROR: Failed while processing the chunk divide!! >>>>>> >>>>>>ERROR: Chunk failed at level 17 >>>>>>!! >>>>>>FAILED CONTIG:PbPT3Sc00006 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>HB >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From masa at bioinfo.hr Thu Feb 13 03:17:11 2014 From: masa at bioinfo.hr (Masa Roller) Date: Thu, 13 Feb 2014 11:17:11 +0100 Subject: [maker-devel] SNAP scores and AED scores Message-ID: <52FC9BA7.6060505@bioinfo.hr> Dear all, I ran snap2 based gene prediction through maker. In the resulting gff file, in the source "snap_masked" I can find the score in the score column of every snap prediction that did not get promoted to a maker gene. This would be the score of how well the prediction matches the HMM? It seems to me that those snap models that are given gene status no longer appear as snap_masked source but only as source "maker". Maker then removes the score column, instead giving AED and eAED scores (which are more about how the model corresponds to the evidence). When viewing the maker transcripts and SNAP predictions in a browser, they do not match (mostly, maker predictions are longer). I am interested in the score of individual gene predictions that underlined maker gene models. Where could I find that information? Many thanks! From carsonhh at gmail.com Thu Feb 13 13:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:11:22 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: <52FC9BA7.6060505@bioinfo.hr> References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: No. Snap genes do not disappear. All SNAP ab initio calls will always be kept as reference fetters marked snap_masked (for repeat masked genome) and snap (for unmasked genome). MAKER then runs SNAP another time where it feeds hints to SNAP based on EST and protein alignment evidence. These hint based models can then compete against the ab initio SNAP models to be promoted to genes if their AED scores are better. Fianl models can also get UTR added based on EST evidence. That is why you can get models from MAKER that do not match the original SNAP ab initio calls. So in summary, all SNAP ab initio models will be in snap_masked. The MAKER models will consist of hint based SNAP rerun plus SNAP ab intio models processed to add UTR. Thanks, Carson On 2/13/14, 3:17 AM, "Masa Roller" wrote: >Dear all, > >I ran snap2 based gene prediction through maker. > >In the resulting gff file, in the source "snap_masked" I can find the >score in the score column of every snap prediction that did not get >promoted to a maker gene. This would be the score of how well the >prediction matches the HMM? > >It seems to me that those snap models that are given gene status no >longer appear as snap_masked source but only as source "maker". Maker >then removes the score column, instead giving AED and eAED scores (which >are more about how the model corresponds to the evidence). When viewing >the maker transcripts and SNAP predictions in a browser, they do not >match (mostly, maker predictions are longer). > >I am interested in the score of individual gene predictions that >underlined maker gene models. Where could I find that information? > >Many thanks! > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Feb 13 13:23:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 13:23:07 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: On a side note. Because the MAKER models involve modifying either the ab initio SNAP model or manipulating the underlying scoring scheme using hints, the SNAP score on those is virtually meaningless. However Ian Korf has developed a tool that can take any gene structure and reverse generate a score (i.e. what would the score of this gene have been if SNAP would have called it that way in the first place). I believe the tool is called fathom and is part of the SNAP package. It is not well documented, so you might have to contact Ian Korf directly for that. You can use the maker2zff tool to generate the input to fathom. Thanks, Carson On 2/13/14, 1:11 PM, "Carson Holt" wrote: >No. Snap genes do not disappear. All SNAP ab initio calls will always be >kept as reference fetters marked snap_masked (for repeat masked genome) >and snap (for unmasked genome). MAKER then runs SNAP another time where >it feeds hints to SNAP based on EST and protein alignment evidence. These >hint based models can then compete against the ab initio SNAP models to be >promoted to genes if their AED scores are better. Fianl models can also >get UTR added based on EST evidence. That is why you can get models from >MAKER that do not match the original SNAP ab initio calls. > >So in summary, all SNAP ab initio models will be in snap_masked. The >MAKER models will consist of hint based SNAP rerun plus SNAP ab intio >models processed to add UTR. > >Thanks, >Carson > > > >On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >>Dear all, >> >>I ran snap2 based gene prediction through maker. >> >>In the resulting gff file, in the source "snap_masked" I can find the >>score in the score column of every snap prediction that did not get >>promoted to a maker gene. This would be the score of how well the >>prediction matches the HMM? >> >>It seems to me that those snap models that are given gene status no >>longer appear as snap_masked source but only as source "maker". Maker >>then removes the score column, instead giving AED and eAED scores (which >>are more about how the model corresponds to the evidence). When viewing >>the maker transcripts and SNAP predictions in a browser, they do not >>match (mostly, maker predictions are longer). >> >>I am interested in the score of individual gene predictions that >>underlined maker gene models. Where could I find that information? >> >>Many thanks! >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From barry.utah at gmail.com Thu Feb 13 13:27:17 2014 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 13 Feb 2014 13:27:17 -0700 Subject: [maker-devel] SNAP scores and AED scores In-Reply-To: References: <52FC9BA7.6060505@bioinfo.hr> Message-ID: <39AA5089-3E89-4067-A8DF-60B6716C98DF@genetics.utah.edu> Hi Masa, Also, if you want additional SNAP output that hasn't been passed forward in MAKER you can alway access the original SNAP output files in the MAKER datastore. This is a directory structure created by MAKER to store contig specific data. There is a datastore directory (and a corresponding index file) in the make output directory. The index file will provide the path to individual contigs and in that contig specific directory there is a directory call theVoid. This contains all of the output of each program that MAKER runs. B On Feb 13, 2014, at 1:11 PM, Carson Holt wrote: > No. Snap genes do not disappear. All SNAP ab initio calls will always be > kept as reference fetters marked snap_masked (for repeat masked genome) > and snap (for unmasked genome). MAKER then runs SNAP another time where > it feeds hints to SNAP based on EST and protein alignment evidence. These > hint based models can then compete against the ab initio SNAP models to be > promoted to genes if their AED scores are better. Fianl models can also > get UTR added based on EST evidence. That is why you can get models from > MAKER that do not match the original SNAP ab initio calls. > > So in summary, all SNAP ab initio models will be in snap_masked. The > MAKER models will consist of hint based SNAP rerun plus SNAP ab intio > models processed to add UTR. > > Thanks, > Carson > > > > On 2/13/14, 3:17 AM, "Masa Roller" wrote: > >> Dear all, >> >> I ran snap2 based gene prediction through maker. >> >> In the resulting gff file, in the source "snap_masked" I can find the >> score in the score column of every snap prediction that did not get >> promoted to a maker gene. This would be the score of how well the >> prediction matches the HMM? >> >> It seems to me that those snap models that are given gene status no >> longer appear as snap_masked source but only as source "maker". Maker >> then removes the score column, instead giving AED and eAED scores (which >> are more about how the model corresponds to the evidence). When viewing >> the maker transcripts and SNAP predictions in a browser, they do not >> match (mostly, maker predictions are longer). >> >> I am interested in the score of individual gene predictions that >> underlined maker gene models. Where could I find that information? >> >> Many thanks! >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mptrsen at uni-bonn.de Thu Feb 13 20:00:24 2014 From: mptrsen at uni-bonn.de (Malte Petersen) Date: Fri, 14 Feb 2014 04:00:24 +0100 Subject: [maker-devel] BLAST options error / should Maker check for file format? Message-ID: <52FD86C8.6040007@uni-bonn.de> Dear MAKER devs, I was running Maker version 2.30p-beta on an insect genome, and it didn't produce any output. I got these error messages: Widget::formater: /path/to/makeblastdb -dbtype nucl -in /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 #-------------------------------# BLAST options error: File /tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-62_e3%2Escaf.mpi.10.0 is empty ERROR: /path/to/makeblastdb failed in Widget::formater --> rank=NA, hostname=Jeanne-GBR ERROR: Failed while doing blastn of ESTs ERROR: Chunk failed at level:0, tier_type:3 FAILED CONTIG:scf7180005143343 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scf7180005143343 I figured out that this error is due to a non-Fasta file format being fed to Maker as extrinsic evidence (I gave it a meta-info file). While I got the pipeline running now with the correct file, I think that it should be complaining (a lot earlier) if any of the input files are of the wrong format. More people might run into this problem and have no idea where to look for a solution. What do you think? Best, Malte From carsonhh at gmail.com Thu Feb 13 20:11:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 13 Feb 2014 20:11:22 -0700 Subject: [maker-devel] BLAST options error / should Maker check for file format? In-Reply-To: <52FD86C8.6040007@uni-bonn.de> References: <52FD86C8.6040007@uni-bonn.de> Message-ID: Hi Malte, Actually there already is. I?m very surprised your file made it that far. Normally it fails right away. Example ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The fasta file /Users/cholt/Developer/maker/trunk/data/test1 appears to be empty. Another test file ?> STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... ERROR: The nucleotide sequence file '/Users/cholt/Developer/maker/trunk/data/test2' appears to contain protein sequence or unrecognized characters. Note the following nucleotides may be valid but are unsupported [RYKMSWBDHV] Please check/fix the file before continuing, or set -fix_nucleotides on the command line to fix this automatically. Invalid Character: 'M' You seem to have found just the right formula of improper input to get past the filters on your run :-) Thanks, Carson On 2/13/14, 8:00 PM, "Malte Petersen" wrote: >Dear MAKER devs, > >I was running Maker version 2.30p-beta on an insect genome, and it >didn't produce any output. I got these error messages: > > >Widget::formater: >/path/to/makeblastdb -dbtype nucl -in >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >#-------------------------------# >BLAST options error: File >/tmp/maker_wwA6WO/0/blastprep/120215_I277_FCD0KP1ACXX_L7_INSjdsTAURAAPEI-6 >2_e3%2Escaf.mpi.10.0 >is empty >ERROR: /path/to/makeblastdb failed in Widget::formater >--> rank=NA, hostname=Jeanne-GBR >ERROR: Failed while doing blastn of ESTs >ERROR: Chunk failed at level:0, tier_type:3 >FAILED CONTIG:scf7180005143343 > >ERROR: Chunk failed at level:4, tier_type:0 >FAILED CONTIG:scf7180005143343 > > >I figured out that this error is due to a non-Fasta file format being >fed to Maker as extrinsic evidence (I gave it a meta-info file). While >I got the pipeline running now with the correct file, I think that it >should be complaining (a lot earlier) if any of the input files are of >the wrong format. More people might run into this problem and have no >idea where to look for a solution. > >What do you think? > >Best, >Malte > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 14 12:09:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 14 Feb 2014 19:09:08 +0000 Subject: [maker-devel] ERROR: Failed while processing the chunk divide!! In-Reply-To: References: , Message-ID: Hi Hossein, So, this is what is going on. The problem is with the GFF3 file, and the problem is that the exon features in that GFF3 should have the mRNA as their parent instead of the gene. When you deleted the "-mRNA-1", the Name of the mRNA became the same as the Name of the gene, which restored the proper relationship between the features. The same problem exists for the CDS features. The solution for this is to make the exon and CDS parent's "point" to the mRNA and not the gene. Since MAKER has very regular rules for making names, this should be pretty straight forward. You should be ok with just adding "-mRNA-1" to the end of all the exon and CDS lines. This will work unless there some mRNAs with alternative splice forms because then the mRNA's will end with something like "-mRNA-2". I've attached a script that should do this for you. Run it with this command "perl fix_gff3_script.pl > " And then run MAKER with the fixed gff3 file in place of the old gff3 file. Let me know if that works, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] Sent: Thursday, February 13, 2014 3:27 PM To: Daniel Ence Subject: Re: ERROR: Failed while processing the chunk divide!! Dear Daniel I downloaded maker 2.31 and ran the same scaffold. Again it gave error on the gff file. I then removed the word mRNA-1 from my gff file and ran it again. It seems to have worked this time. Attached are std error files for first try std-err (the one that failed) and 2nd one named std-err-wo-mRNA (that apparently worked). Since the gff file is as evidence only I thought it should not matter to remove the mRNA-1 naming form the gff file. Cheers HB On 14-02-12 12:59 PM, "Daniel Ence" wrote: >Hi Hossein, > >So, after looking at the gff3 and your control files, I had an idea. >There's the part of the control file called "Re-annotation Using MAKER >Derived GFF3", but you can also passthrough features from a gff3 using >the "est_gff", "protein_gff", "rm_gff", "pred_gff", "model_gff" lines. > >Sometimes we encounter problems with the MAKER passthrough. Could you try >dividing the gff3 file into the different feature sources and passing it >through the "est_gff" etc options and not with the MAKER passthrough? >That will tell us if the problem is with the gff3 file or with how MAKER >is processing it. > >Another also to check is to make sure that the contig names in the gff3 >file match the contig names in the fasta file that you're annotating. > >Thanks, >Daniel > > > >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >Sent: Wednesday, February 12, 2014 8:49 AM >To: Daniel Ence >Subject: Re: ERROR: Failed while processing the chunk divide!! > >Dear Daniel > > >I have generated the files that you requested. I choose Sc00009 from my >genome which is 30 kb and was one of the scaffolds coming up with error. >In addition to Ctl files and error output file I also attached a part of >the gff file related to SC00009 that is indicated in the error message. > > >Thanks for helping with this > > > >Regards > > >HB > > > > > > > > > > > > >On 14-02-11 4:59 PM, "Daniel Ence" wrote: > >>Hi Hossen, >> >>I think that what would be the most help right now is if you ran MAKER on >>only one of those contigs that are failing and send me the entire error >>output along with the maker control files that you are using. It looks >>like the error is coming from the gff3 files that you are using as input. >> >>Thanks, >>Daniel >> >> >> >>Daniel Ence >>Graduate Student >>Eccles Institute of Human Genetics >>University of Utah >>15 North 2030 East, Room 2100 >>Salt Lake City, UT 84112-5330 >>________________________________________ >>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>Sent: Tuesday, February 11, 2014 3:51 PM >>To: Daniel Ence >>Subject: ERROR: Failed while processing the chunk divide!! >> >>Dear Daniel >> >>I re-started maker and it is still running. But in error our file that >>has >>been generated so far it seems that smaller conitgs are affected. There >>are contigs of 2-4 kb with this error but also I noticed a contig of 30kb >>length having this error >> >>I was wondering if I need to change the setting in the maker_opt file >> >>#-----MAKER Behavior Options >>max_dna_len=100000 #length for dividing up contigs into chunks >>(increases/decreases memory usage) >>min_contig=1 #skip genome contigs below this length (under 10kb are often >>useless) >> >> >>If I understand correctly max_dna_len divide conitgs of over 100kb to >>smaller chucks. However it is not clear to me that for the min_contig >>option if the default contig length is 10kb or less, then why I have >>error >>message for 30kb long contigs. Should I change this to 0 >> >>Here is an example of the error message for one of the contigs >> >> >>#--------- command -------------# >>Widget::exonerate::est2genome: >>/usr/local/exonerate-2.2.0-x86_64/bin/exonerate -q >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genome_datastore/35 >>/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/comp14545_c0_seq1.fasta >>-t >>/raid01/projects/Plasmodiophora/brassicae/PT3/version2/Maker-config/P.bra >>s >>s >>icae.PT3.v1.genome.maker.output/P.brassicae.PT3.v1.genom >>e_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT3Sc00001.235-113 >>6 >>. >>fasta >>-Q dna -T dna --model est2genome >>--minintron 20 --showcigar --percent 20 > >>/raid01/projects/Plasmodiophora/brassica >>e/PT3/version2/Maker-config/P.brassicae.PT3.v1.genome.maker.output/P.bras >>s >>i >>cae.PT3.v1.genome_datastore/35/17/PbPT3Sc00001//theVoid.PbPT3Sc00001/PbPT >>3 >>S >>c00001.235-1136.comp14545_c0_seq1.est_exonerate >>#-------------------------------# >>cleaning blastn... >>cleaning tblastx... >>cleaning blastx... >>ERROR: Failed on >>PbPT3Sc00001_S_0.8_1-mRNA-1 >>Check your input GFF3 file for errors! >>(from GFFDB) >> >>FATAL ERROR >>ERROR: Failed while processing the chunk >>divide!! >> >>ERROR: Chunk failed at level 17 >>!! >>FAILED CONTIG:PbPT3Sc00001 >> >> >> >> >>--Next Contig-- >> >> >> >> >> >> >>Regards >> >> >>HB >> >> >> >> >> >> >> >> >> >> >>On 14-02-11 12:37 PM, "Daniel Ence" wrote: >> >>>Hossein, >>> >>>Ok. So since this error came up on a local install, I'm going to need >>>some more information to understand what went wrong. Is it the same >>>contig that always causes this error? If it is, then is the the only >>>error or warning that MAKER encounters while running on this contig? Or, >>>if multiple contigs fail, then is it always the same error? >>> >>>If you can narrow it down to the smallest possible dataset that >>>consistently gives the same error, then we canb egin to understand >>>what's >>>wrong. >>> >>>Thanks, >>>Daniel >>> >>> >>>Daniel Ence >>>Graduate Student >>>Eccles Institute of Human Genetics >>>University of Utah >>>15 North 2030 East, Room 2100 >>>Salt Lake City, UT 84112-5330 >>>________________________________________ >>>From: Borhan, Hossein [Hossein.Borhan at AGR.GC.CA] >>>Sent: Tuesday, February 11, 2014 11:20 AM >>>To: Daniel Ence >>>Subject: Re: [maker-devel] Falied to create new account >>> >>>Hi Daniel >>> >>>I running it through the local server at my work >>> >>> >>> >>> >>> >>> >>>M. Hossein Borhan, Ph.D. >>>Research Scientist/ Chercheur Scientifique >>>Saskatoon Research Centre/Centre de Recherches de Saskatoon >>>Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada >>>107 Science Place, Saskatoon, SK.,S7N 0X2 >>>Telephone/T?l?phone: (306) 385-9441 >>>Facsimile/T?l?copieur: (306) 385-9482 >>>Hossein.borhan at agr.gc.ca >>> >>> >>> >>> >>> >>> >>> >>> >>>On 14-02-11 12:16 PM, "Daniel Ence" wrote: >>> >>>>Hi Hossein, >>>> >>>>Did you encounter this error while you were running MAKER on your local >>>>machine or through the MAKER web annotation service? >>>> >>>>Thanks, >>>>Daniel >>>> >>>> >>>>Daniel Ence >>>>Graduate Student >>>>Eccles Institute of Human Genetics >>>>University of Utah >>>>15 North 2030 East, Room 2100 >>>>Salt Lake City, UT 84112-5330 >>>>________________________________________ >>>>From: Carson Holt [carsonhh at gmail.com] >>>>Sent: Tuesday, February 11, 2014 10:18 AM >>>>To: Daniel Ence >>>>Cc: Mark Yandell >>>>Subject: FW: [maker-devel] Falied to create new account >>>> >>>>Hey Daniel could you download his dataset, and see if you can replicate >>>>the error. Also check if this was an MWAS job or a local maker run >>>>(his >>>>dataset will already be there for MWAS, you just need the job ID). >>>> >>>>Thanks, >>>>Carson >>>> >>>>On 2/11/14, 10:16 AM, "Borhan, Hossein" >>>>wrote: >>>> >>>>>Hi Carson >>>>> >>>>> >>>>>I encountered this error while running maker >>>>> >>>>>FATAL ERROR >>>>>ERROR: Failed while processing the chunk divide!! >>>>> >>>>>ERROR: Chunk failed at level 17 >>>>>!! >>>>>FAILED CONTIG:PbPT3Sc00006 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>HB >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>> >>>> >>>> >>> >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: fix_gff3_script.pl Type: application/octet-stream Size: 349 bytes Desc: fix_gff3_script.pl URL: From claudio.valero at wur.nl Mon Feb 17 02:23:21 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Mon, 17 Feb 2014 09:23:21 +0000 Subject: [maker-devel] Maker not predicting many genes Message-ID: Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.log Type: application/octet-stream Size: 4776 bytes Desc: maker_opts.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SOBA.pdf Type: application/pdf Size: 210262 bytes Desc: SOBA.pdf URL: From carson.holt at genetics.utah.edu Mon Feb 17 12:22:13 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 17 Feb 2014 19:22:13 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 17 12:26:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 17 Feb 2014 12:26:05 -0700 Subject: [maker-devel] Maker not predicting many genes Message-ID: >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 01:20:04 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 08:20:04 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 19 08:34:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 19 Feb 2014 08:34:33 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: Message-ID: You provided a directory rather than a file to the -d option (?d' stands for datastore log). You must provide the location of the datastore index log file and not the datastore directory. Example ?> ./dpp_contig.maker.output/dpp_contig_master_datastore_index.log Thanks, Carson From: "Valero Jimenez, Claudio" Date: Wednesday, February 19, 2014 at 1:20 AM To: Carson Holt , Carson Holt , "'maker-devel at yandell-lab.org'" Subject: RE: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Feb 19 09:04:08 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 19 Feb 2014 16:04:08 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From claudio.valero at wur.nl Wed Feb 19 09:33:36 2014 From: claudio.valero at wur.nl (Valero Jimenez, Claudio) Date: Wed, 19 Feb 2014 16:33:36 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: Hi, Thanks, I had a mistake in the command line!!! Regards, Claudio From: Daniel Ence [mailto:dence at genetics.utah.edu] Sent: woensdag 19 februari 2014 17:04 To: Valero Jimenez, Claudio; 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: RE: [maker-devel] Maker not predicting many genes Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes >From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I'd set correct_est_fusion=1 as well. -Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR's clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I'm trying to annotate a fungal genome, and I'm surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Wed Feb 19 11:03:47 2014 From: barry.utah at gmail.com (Barry Moore) Date: Wed, 19 Feb 2014 11:03:47 -0700 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: References: , Message-ID: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: > Hi Claudio, > > What was the command line you used for gff3_merge? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] > Sent: Wednesday, February 19, 2014 1:20 AM > To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > Hi Carson, > > Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: > > Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. > > Similar thing happens when I try fasta_merge: > > Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. > > I never had this problem before with these commands. > > > Regards, > > Claudio > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: maandag 17 februari 2014 20:26 > To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' > Subject: Re: [maker-devel] Maker not predicting many genes > > From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. > > ?Carson > > > From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM > To: "Valero Jimenez, Claudio" , "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes > > You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. > > If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). > > Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. > > Thanks, > Carson > > > > > > > From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM > To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes > > Dear list, > > I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. > > Regards, > > Claudio > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 19 11:06:52 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 19 Feb 2014 18:06:52 +0000 Subject: [maker-devel] Maker not predicting many genes In-Reply-To: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> References: <0F5B5A10-4B50-47EC-847B-0223E4CCF612@genetics.utah.edu> Message-ID: You only need to swap a single character in the script. Just change the -e (exists) test to a -f (is file) test. Thanks, Carson From: Barry Moore > Date: Wednesday, February 19, 2014 at 11:03 AM To: Daniel Ence > Cc: "Valero Jimenez, Claudio" >, Carson Holt >, Carson Holt >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes Hi Daniel, Could you add an error message to those two scripts that detects that a filename is missing or that a directory was given instead and gives the user a suggested solution. Thanks, B On Feb 19, 2014, at 9:04 AM, Daniel Ence wrote: Hi Claudio, What was the command line you used for gff3_merge? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Valero Jimenez, Claudio [claudio.valero at wur.nl] Sent: Wednesday, February 19, 2014 1:20 AM To: 'Carson Holt'; Carson Holt; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes Hi Carson, Thank you for your suggestions. I ran again Maker and it was able to predict many more genes. Although I have a different problem now. I try to run gff3_merge and get the following error: Use of uninitialized value $outfile in substitution (s///) at ./gff3_merge line 67. Similar thing happens when I try fasta_merge: Use of uninitialized value $outfile in substitution (s///) at ./fasta_merge line 52. I never had this problem before with these commands. Regards, Claudio From: Carson Holt [mailto:carsonhh at gmail.com] Sent: maandag 17 februari 2014 20:26 To: Carson Holt; Valero Jimenez, Claudio; 'maker-devel at yandell-lab.org' Subject: Re: [maker-devel] Maker not predicting many genes From your control file, it looks like not setting single_exon=1, and only using UniProt rather than supplying complete proteomes of a related species are your primary shortcomings. I?d set correct_est_fusion=1 as well. ?Carson From: Carson Holt > Date: Monday, February 17, 2014 at 12:22 PM To: "Valero Jimenez, Claudio" >, "'maker-devel at yandell-lab.org'" > Subject: Re: [maker-devel] Maker not predicting many genes You also need to look at the contigs in a browser like apollo. That will allow you to see both the predictions and the evidence in context. You can then see if genes are being dropped because they are only being supported by single exon evidence, they have no evidence support whatsoever, or if they are being excluded because of UTR overlap. That last one is a common problem for fungi when using assembled mRNA-seq reads. Fungi genes are so close that they often overlap in the UTR. As a result, mRNA-seq assemblers falsely asseble neighboring genes into single transcripts. The result is really long UTR on some of your gene models that force other models to be excluded. If this is the case, rerun something like trinity with the jacquard clip option set to avoid transcript fusion. Then set correct_est_fusion=1 in the MAKER control files to get those long false UTR?s clipped off. If it is a lack of evidence overlap, make sure you provided minimum 1 proteome from a related species to the protein= option. At least 2 proteomes are recommended though (these are not proteins from the same species but rather complete proteomes from related species). Also comprehensive databases like UniProt/Swiss-prot are not sufficient on their own, but can supplement the other proteome data. Also are you providing EST data? Note that EST/mRNA-seq data without a proteome from a related species is also not siufficient (because both quality and how comprehensive EST/mRNA-seq databsases are can vary so widely, and may only capture as little as 30% of the genes). Another thing that comes into play are single exon evidence. In anything but fungi, single exon evidence is mostly caused by spurious alignments. But fungi have so many single exon genes, that this is not the case for them. Make sure single_exon=1 is set to allow that evidence to be kept, and set the length of single exon evidence to keep to something like 250 bp. Thanks, Carson From: "Valero Jimenez, Claudio" > Date: Monday, February 17, 2014 at 2:23 AM To: "'maker-devel at yandell-lab.org'" > Subject: Maker not predicting many genes Dear list, I?m trying to annotate a fungal genome, and I?m surprised that Maker does not predict many genes (3697). I have trained SNAP and followed all the tutorials available. Ab initio predictors are able to predict between 8000-10000 genes. It is something that I have in the configuration file that is wrong?? I attach the ops file and the SOBA summary of the annotation. Regards, Claudio _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gtaylor at bcgsc.ca Fri Feb 21 11:48:42 2014 From: gtaylor at bcgsc.ca (Greg Taylor) Date: Fri, 21 Feb 2014 10:48:42 -0800 Subject: [maker-devel] Maker jobs hanging Message-ID: Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor From dence at genetics.utah.edu Fri Feb 21 11:54:17 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 18:54:17 +0000 Subject: [maker-devel] Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, Since this is probably going to be a more complicated situation, would you upload your data and control file at this URL so that we can try to replicate the error on our machines? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=166 Also, which version of MPI are you using? And you might want to try updating MAKER. I think version 2.31 was just updated a few weeks ago. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Greg Taylor [gtaylor at bcgsc.ca] Sent: Friday, February 21, 2014 11:48 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] Maker jobs hanging Hello, I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb genome with predictors SNAP and Genemark, and using ABySS assembled RNA-seq data. To do this I am using 480 processors on our local cluster. Once a run begins, 479 contigs are started, as noted in the *_master_datastore_index.log file, the standard error log for the whole job looks normal, as do the run.log and run.log.child.0 for the daughter processes. This seems to be sequence dependent, as re-running contigs that hang doesn't help, the same contigs will always hang. I'm still looking into this myself, but it seems most if not all the jobs are stuck at the Blastx stage. If you have any suggestions, your help would be greatly appreciated. sincerely, Greg Taylor _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Feb 21 11:56:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 21 Feb 2014 11:56:50 -0700 Subject: [maker-devel] Maker jobs hanging Message-ID: Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 15:04:34 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 21 Feb 2014 22:04:34 +0000 Subject: [maker-devel] FW: Maker jobs hanging In-Reply-To: References: Message-ID: Hi Greg, You should be able to have the new MAKER work on the old datastore. Note the following advice from the main MAKER developer, Carson Holt. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Friday, February 21, 2014 11:56 AM To: Greg Taylor; maker-devel at yandell-lab.org Subject: Re: [maker-devel] Maker jobs hanging Use 2.31. It has been tested to work without issue on several thousand cpus. Also use OpenMPI for any jobs greater than 100 cpus. In addition, OpenMPI can freeze on some systems without the following flag when using perl based MPI programs --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 200 maker Finally, never use MVAPICH2. It doesn't play well with perl, and freezes whenever perl based MPI jobs extend across nodes (they run fine within a single node though). ?Carson On 2/21/14, 11:48 AM, "Greg Taylor" wrote: >Hello, > I'm having a problem with Maker_2.28 jobs hanging. I am annotating a 3Gb >genome with predictors SNAP and Genemark, and using ABySS assembled >RNA-seq data. To do this I am using 480 processors on our local cluster. >Once a run begins, 479 contigs are started, as noted in the >*_master_datastore_index.log file, the standard error log for the whole >job looks normal, as do the run.log and run.log.child.0 for the daughter >processes. This seems to be sequence dependent, as re-running contigs >that hang doesn't help, the same contigs will always hang. I'm still >looking into this myself, but it seems most if not all the jobs are stuck >at the Blastx stage. If you have any suggestions, your help would be >greatly appreciated. > >sincerely, >Greg Taylor >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Fri Feb 21 19:38:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 02:38:59 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> Message-ID: Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Fri Feb 21 21:27:10 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 04:27:10 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu>, <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu>, , Message-ID: Hi Joe, MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. Will you attach those file to your reply, so we can make sure that the settings are set up correctly? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 7:44 PM To: Daniel Ence Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Daniel, Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. Thanks, Joe ________________________________________ From: Daniel Ence Sent: Friday, February 21, 2014 7:38 PM To: Joseph Said Cc: maker-devel at yandell-lab.org Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 Hi Joe, Will you upload your control files and data at this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: Mark Yandell Sent: Friday, February 21, 2014 7:32 PM To: Daniel Ence Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: Joseph Said [joesaid at nmsu.edu] Sent: Friday, February 21, 2014 5:18 PM To: Mark Yandell Subject: I am a PhD candidate at NMSU and have a question about maker2 Dear Dr. Yandell, I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. Thank you, Joe Sent from my iPad From dence at genetics.utah.edu Sat Feb 22 15:51:48 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 22:51:48 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu>, Message-ID: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Sat Feb 22 16:21:51 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Sat, 22 Feb 2014 23:21:51 +0000 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> , Message-ID: Hi Hua, will you upload the genome file to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=170 I am more concerned that MAKER didn't find the gene in the whole genome than in the 60bp substring. I think that MAKER needs more sequence than that to annotate a gene model. Will you also upload the MAKER output and datastore from the MAKER run? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 4:00 PM To: Daniel Ence Cc: maker-devel at yandell-lab.org; Joseph Said; Joe Song Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" > wrote: Hi, Will you send me the long file that you were trying to blast against? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: Hua Zhong [zh9118 at gmail.com] Sent: Saturday, February 22, 2014 10:46 AM To: Daniel Ence Cc: Joe Song; Joseph Said Subject: Re: I am a PhD candidate at NMSU and have a question about maker2 hi all, Attached are the three configuration files and two input files, which are used to predict something between the genome and protein. For a simple test, we used one short sequence about 60bp and its translated protein sequence as inputs. But got nothing returned. What's more, we did test long genome sequence as one input as well, but still got nothing. I am not sure what's the reason cause this result. Thanks a lot for help. Hua On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said > wrote: Hi Daniel, I do not have the exact files with me right now, but my coauthors on the paper I am working on have been copied on this email. Hua can send you those files. Thank you for being very helpful especially on a Friday night. Thanks, Joe Sent from my iPad > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" > wrote: > > Hi Joe, > > MAKER runs blast from your local system (or your server where MAKER is installed), and it blasts evidence that the user supplies in the "est" and "protein" settings. The est and protein settings are set in the maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file and the specific blast settings are in the "maker_bopts.ctl" file. > > Will you attach those file to your reply, so we can make sure that the settings are set up correctly? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 7:44 PM > To: Daniel Ence > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Daniel, > > Thank you for getting back to me so quickly. I am using the cotton Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I believe maker2 just calls BLAST from NCBI's page. So when I search the cotton genome it returns zero hits. But then I used a known cotton gene as a test and ran a search and also returned zero hits. I am not sure what the problem is but it seems like the protocol that should be returning the results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. I can a BLAST standalone and came up with hits for both my gene of interest and the control test gene and came up with results. > > Thanks, > Joe > ________________________________________ > From: Daniel Ence > > Sent: Friday, February 21, 2014 7:38 PM > To: Joseph Said > Cc: maker-devel at yandell-lab.org > Subject: RE: I am a PhD candidate at NMSU and have a question about maker2 > > Hi Joe, > > Will you upload your control files and data at this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 > > Also, what version of MAKER and blast are you using? And which file are you using for the known arabidopsis gene? > > I've copied this email to the maker-development list, which is a really good resource for trouble-shooting MAKER issues. > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: Mark Yandell > Sent: Friday, February 21, 2014 7:32 PM > To: Daniel Ence > Subject: FW: I am a PhD candidate at NMSU and have a question about maker2 > > Mark Yandell > Professor of Human Genetics > H.A. & Edna Benning Presidential Endowed Chair > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ph:801-587-7707 > > ________________________________________ > From: Joseph Said [joesaid at nmsu.edu] > Sent: Friday, February 21, 2014 5:18 PM > To: Mark Yandell > Subject: I am a PhD candidate at NMSU and have a question about maker2 > > Dear Dr. Yandell, > > I am a molecular biologist at NMSU. I am trying to use maker2 with the cotton genome, and search an Arabidopsis gene against it. I think there is a problem with the blast component because zero results are returned. I tried troubleshooting by searching a known gene and still returned zero results. Is this a common problem maybe with the pipeline? I would appreciate any ideas you might have to help me. > > Thank you, > Joe > > Sent from my iPad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Sun Feb 23 09:57:09 2014 From: mikael.durling at slu.se (=?iso-8859-1?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 16:57:09 +0000 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: <4CFD158A-DE75-4756-AD05-4CBF99BAF72D@slu.se> Dear list and maker developers, I was browsing the results of a recent maker run, focusing on differences between this run with the a recent maker (svn r1067) and a previous run with svn revision 1022 (I recall). One of the differences I found was a gene lost in the new prediction set, but replaced by an extended version of a previous neighbor (see http://figshare.com/articles/Maker_prediction_comparison/942300). As you can see, there is no support for the join in the evidence. Do you have any clue to what might cause this? Best regards, Mikael Durling From carsonhh at gmail.com Sun Feb 23 13:00:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 23 Feb 2014 13:00:50 -0700 Subject: [maker-devel] Maker predicting fusion genes? Message-ID: The image doesn?t show all evidence sources, but the short answer is that one of you evidence sources (est2genome, protein2genome, or blastx) bridges the two regions, and when provided the bridged hint one of the gene predictors thinks it makes sense to create a single model instead. my guess is that it?s blastx evidence. ?Carson On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" wrote: >Dear list and maker developers, > >I was browsing the results of a recent maker run, focusing on differences >between this run with the a recent maker (svn r1067) and a previous run >with svn revision 1022 (I recall). One of the differences I found was a >gene lost in the new prediction set, but replaced by an extended version >of a previous neighbor (see >http://figshare.com/articles/Maker_prediction_comparison/942300). As you >can see, there is no support for the join in the evidence. Do you have >any clue to what might cause this? > >Best regards, >Mikael Durling > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Sun Feb 23 14:14:00 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Sun, 23 Feb 2014 21:14:00 +0000 Subject: [maker-devel] Maker predicting fusion genes? In-Reply-To: References: Message-ID: <7CCC5270-93B9-4E5A-9687-26A1BF0EB1F8@slu.se> Ok, do you by that imply that the predictions that end up in the gff3 output from the ab initio predictors (snap_masked, augustus_masked, and genemark), are not the final hinted predictions? Otherwise, I?m sorry that I can?t follow your reasoning. I checked my gff file, and there is no evidence there to support the bridge, as far as I can tell (See attached gff of the region or http://figshare.com/articles/Maker_prediction/942301 where all evidence is plotted). Mikael 23 feb 2014 kl. 21:00 skrev Carson Holt : > The image doesn?t show all evidence sources, but the short answer is that > one of you evidence sources (est2genome, protein2genome, or blastx) > bridges the two regions, and when provided the bridged hint one of the > gene predictors thinks it makes sense to create a single model instead. > my guess is that it?s blastx evidence. > > ?Carson > > > On 2/23/14, 9:57 AM, "Mikael Brandstr?m Durling" > wrote: > >> Dear list and maker developers, >> >> I was browsing the results of a recent maker run, focusing on differences >> between this run with the a recent maker (svn r1067) and a previous run >> with svn revision 1022 (I recall). One of the differences I found was a >> gene lost in the new prediction set, but replaced by an extended version >> of a previous neighbor (see >> http://figshare.com/articles/Maker_prediction_comparison/942300). As you >> can see, there is no support for the join in the evidence. Do you have >> any clue to what might cause this? >> >> Best regards, >> Mikael Durling >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: region.gff3 Type: application/octet-stream Size: 19612 bytes Desc: region.gff3 URL: From hedgyx at yahoo.com Mon Feb 24 00:02:41 2014 From: hedgyx at yahoo.com (Megan) Date: Sun, 23 Feb 2014 23:02:41 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides Message-ID: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Maker folks, I am re-annotating a single contig and I am having a few problems. First, I am having trouble passing through a Maker derived gff (from Maker 2.09, with some modifications to gene names and functional information added). The gff file passes the modencode validator but Maker always fails on the first gene in the file, regardless of which gene comes first. So it appears to be a systematic error across the entire file. The Maker error is "Check your input GFF3 file for errors! (from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff with model_pass=1 and pred_gff. Attached is a gff with the first 2 genes. Second, when I updated to Maker 2.31, Maker now complains that my EST fasta file has nucleotides that are not supported [RYKMSWBDHV]. It suggests "set -fix_nucleotides on the command line to fix this automatically". Is the -fix_nucleotides a Maker flag? What exactly does it do? Does it remove the entire sequence or replace ambiguous bases with a randomly selected one? Half of my 20k ESTs contain these characters, so I don't want to throw them out entirely. Also, just curious, has Maker never supported these characters but just never complained? I used this EST data set with Maker 2.09. I did note poor EST coverage, but thought it was an issue with the EST data itself. I appreciate any suggestions. Thanks, Megan -------------- next part -------------- A non-text attachment was scrubbed... Name: part_passthru.gff Type: application/octet-stream Size: 4363 bytes Desc: not available URL: From zh9118 at gmail.com Sat Feb 22 16:00:28 2014 From: zh9118 at gmail.com (Hua Zhong) Date: Sat, 22 Feb 2014 16:00:28 -0700 Subject: [maker-devel] I am a PhD candidate at NMSU and have a question about maker2 In-Reply-To: References: <8E40368A-AEC9-4BC9-BEEE-699E378D119A@nmsu.edu> <7A60AB257EFF2B48B1F4C814817EA05365F298FF@mxb2.hg.genetics.utah.edu> <6FA1C2F9-68A0-4154-8825-7B502E4762BF@nmsu.edu> Message-ID: The long file we used is a whole genome. Quite huge a file. I am not able to send that. Sorry. But in the simple test i told you, the nucleotide sequence sent you is consider to be the genome file, and protein sequence is another input. There two are what we want to blast against to each other to see if Maker2 works well. Thanks. On Feb 22, 2014 3:51 PM, "Daniel Ence" wrote: > Hi, > > Will you send me the long file that you were trying to blast against? > > Thanks, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ------------------------------ > *From:* Hua Zhong [zh9118 at gmail.com] > *Sent:* Saturday, February 22, 2014 10:46 AM > *To:* Daniel Ence > *Cc:* Joe Song; Joseph Said > *Subject:* Re: I am a PhD candidate at NMSU and have a question about > maker2 > > hi all, > Attached are the three configuration files and two input files, which are > used to predict something between the genome and protein. For a simple > test, we used one short sequence about 60bp and its translated protein > sequence as inputs. But got nothing returned. What's more, we did test long > genome sequence as one input as well, but still got nothing. I am not sure > what's the reason cause this result. > Thanks a lot for help. > > Hua > > > > > On Fri, Feb 21, 2014 at 9:31 PM, Joseph Said wrote: > >> Hi Daniel, >> >> I do not have the exact files with me right now, but my coauthors on the >> paper I am working on have been copied on this email. Hua can send you >> those files. Thank you for being very helpful especially on a Friday night. >> >> Thanks, >> Joe >> >> Sent from my iPad >> >> > On Feb 21, 2014, at 9:27 PM, "Daniel Ence" >> wrote: >> > >> > Hi Joe, >> > >> > MAKER runs blast from your local system (or your server where MAKER is >> installed), and it blasts evidence that the user supplies in the "est" and >> "protein" settings. The est and protein settings are set in the >> maker_opts.ctl file. The path to blast is set in the "maker_exe.ctl" file >> and the specific blast settings are in the "maker_bopts.ctl" file. >> > >> > Will you attach those file to your reply, so we can make sure that the >> settings are set up correctly? >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 7:44 PM >> > To: Daniel Ence >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Daniel, >> > >> > Thank you for getting back to me so quickly. I am using the cotton >> Gossypium raimondii D genome from NCBI, and the arabidopsis gene is the >> GUN1 gene with ID UGID:8241, UniGene At.20815. I am using Maker2, and I >> believe maker2 just calls BLAST from NCBI's page. So when I search the >> cotton genome it returns zero hits. But then I used a known cotton gene as >> a test and ran a search and also returned zero hits. I am not sure what the >> problem is but it seems like the protocol that should be returning the >> results of NCBI's BLAST is returning 0 to Maker2 which is reporting 0 hits. >> I can a BLAST standalone and came up with hits for both my gene of interest >> and the control test gene and came up with results. >> > >> > Thanks, >> > Joe >> > ________________________________________ >> > From: Daniel Ence >> > Sent: Friday, February 21, 2014 7:38 PM >> > To: Joseph Said >> > Cc: maker-devel at yandell-lab.org >> > Subject: RE: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Hi Joe, >> > >> > Will you upload your control files and data at this URL? >> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=169 >> > >> > Also, what version of MAKER and blast are you using? And which file are >> you using for the known arabidopsis gene? >> > >> > I've copied this email to the maker-development list, which is a really >> good resource for trouble-shooting MAKER issues. >> > >> > Thanks, >> > Daniel >> > >> > >> > Daniel Ence >> > Graduate Student >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ________________________________________ >> > From: Mark Yandell >> > Sent: Friday, February 21, 2014 7:32 PM >> > To: Daniel Ence >> > Subject: FW: I am a PhD candidate at NMSU and have a question about >> maker2 >> > >> > Mark Yandell >> > Professor of Human Genetics >> > H.A. & Edna Benning Presidential Endowed Chair >> > Eccles Institute of Human Genetics >> > University of Utah >> > 15 North 2030 East, Room 2100 >> > Salt Lake City, UT 84112-5330 >> > ph:801-587-7707 >> > >> > ________________________________________ >> > From: Joseph Said [joesaid at nmsu.edu] >> > Sent: Friday, February 21, 2014 5:18 PM >> > To: Mark Yandell >> > Subject: I am a PhD candidate at NMSU and have a question about maker2 >> > >> > Dear Dr. Yandell, >> > >> > I am a molecular biologist at NMSU. I am trying to use maker2 with the >> cotton genome, and search an Arabidopsis gene against it. I think there is >> a problem with the blast component because zero results are returned. I >> tried troubleshooting by searching a known gene and still returned zero >> results. Is this a common problem maybe with the pipeline? I would >> appreciate any ideas you might have to help me. >> > >> > Thank you, >> > Joe >> > >> > Sent from my iPad >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Feb 24 11:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:18:18 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Feb 24 11:31:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 24 Feb 2014 18:31:47 +0000 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com>, Message-ID: Hi Megan, One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name. You can fix it with this command-line perl: cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3 It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating. Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, February 24, 2014 11:18 AM To: Megan; maker-devel at yandell-lab.org Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides The -fix_nucleotides flag is added to the command line (I.e. maker -fix_nucleotides flag). It is there so you are aware that there is an issue with your fasta file, that will cause things downstream to fail. MAKER can fix the errors for you, but first it gives a warning designed to make you look at the file and validate it. Why would you want to do this? For example, what if you provided protein sequence to the EST option accidentally, you wouldn?t want MAKER to just proceed. You want a warning so you can check first. If your file is in fact EST data, then set the flag and those characters will be changed to N?s in the fixed fasta sequence, otherwise those characters will cause errors in downstream tools like exonerate, and even some downstream GMOD tools, so they can?t be allowed to remain as is. For the GFF3 file, there is almost definitely a logic issue in the file (mod encode validator won?t check for those). This can be from prior manipulation of the GFF3 file. For example, IDs for a gene that are the same across two contigs (technically valid but a logic error). The GFF3 error message will normally give the ID of the feature causing the issue. I could also take a look for you. You can upload the GFF3 file here ?> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi Click on 'new guest account' then e-mail me back you guest ID, so I know which files to review. Thanks, Carson On 2/24/14, 12:02 AM, "Megan" wrote: >Maker folks, >I am re-annotating a single contig and I am having a few problems. > >First, I am having trouble passing through a Maker derived gff (from >Maker 2.09, with some modifications to gene names and functional >information added). The gff file passes the modencode validator but >Maker always fails on the first gene in the file, regardless of which >gene comes first. So it appears to be a systematic error across the >entire file. The Maker error is "Check your input GFF3 file for errors! >(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >with model_pass=1 and pred_gff. Attached is a gff with the first 2 >genes. > >Second, when I updated to Maker 2.31, Maker now complains that my EST >fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >suggests "set -fix_nucleotides on the command line to fix this >automatically". Is the -fix_nucleotides a Maker flag? What exactly does >it do? Does it remove the entire sequence or replace ambiguous bases >with a randomly selected one? Half of my 20k ESTs contain these >characters, so I don't want to throw them out entirely. > >Also, just curious, has Maker never supported these characters but just >never complained? I used this EST data set with Maker 2.09. I did note >poor EST coverage, but thought it was an issue with the EST data itself. > >I appreciate any suggestions. >Thanks, >Megan_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 11:34:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 11:34:28 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393225361.62255.YahooMailBasic@web162206.mail.bf1.yahoo.com> Message-ID: Actually that is not true. CDS IDs can be the same or different. MAKER doesn?t care either way. Both are valid in GFF3. Having the same ID just allows then to be put together by some GMOD viewers without having to go through a container feature. ?Carson On 2/24/14, 11:31 AM, "Daniel Ence" wrote: >Hi Megan, > >One problem with the GFF3 that you attached is that the ID's for the CDS >features are being made wrong. All of the CDS features for a given mRNA >or transcript should have the same ID. The CDS features in your GFF3 have >IDs that use the exon name. > >You can fix it with this command-line perl: >cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; >/Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print >"$_\n"}else{print $_}' > fixed.gff3 > >It just fixes the ID attributes in all of the CDS features. Try it on the >test gff3 you sent and let me know if it works. I can't test it myself >without the fasta file that you are annotating. > >Thanks, >Daniel > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Carson Holt [carsonhh at gmail.com] >Sent: Monday, February 24, 2014 11:18 AM >To: Megan; maker-devel at yandell-lab.org >Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > >The -fix_nucleotides flag is added to the command line (I.e. maker >-fix_nucleotides flag). It is there so you are aware that there is an >issue with your fasta file, that will cause things downstream to fail. >MAKER can fix the errors for you, but first it gives a warning designed to >make you look at the file and validate it. Why would you want to do this? > For example, what if you provided protein sequence to the EST option >accidentally, you wouldn?t want MAKER to just proceed. You want a warning >so you can check first. If your file is in fact EST data, then set the >flag and those characters will be changed to N?s in the fixed fasta >sequence, otherwise those characters will cause errors in downstream tools >like exonerate, and even some downstream GMOD tools, so they can?t be >allowed to remain as is. > >For the GFF3 file, there is almost definitely a logic issue in the file >(mod encode validator won?t check for those). This can be from prior >manipulation of the GFF3 file. For example, IDs for a gene that are the >same across two contigs (technically valid but a logic error). The GFF3 >error message will normally give the ID of the feature causing the issue. > >I could also take a look for you. You can upload the GFF3 file here ?> >http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >Click on 'new guest account' then e-mail me back you guest ID, so I know >which files to review. > >Thanks, >Carson > > > >On 2/24/14, 12:02 AM, "Megan" wrote: > >>Maker folks, >>I am re-annotating a single contig and I am having a few problems. >> >>First, I am having trouble passing through a Maker derived gff (from >>Maker 2.09, with some modifications to gene names and functional >>information added). The gff file passes the modencode validator but >>Maker always fails on the first gene in the file, regardless of which >>gene comes first. So it appears to be a systematic error across the >>entire file. The Maker error is "Check your input GFF3 file for errors! >>(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff >>with model_pass=1 and pred_gff. Attached is a gff with the first 2 >>genes. >> >>Second, when I updated to Maker 2.31, Maker now complains that my EST >>fasta file has nucleotides that are not supported [RYKMSWBDHV]. It >>suggests "set -fix_nucleotides on the command line to fix this >>automatically". Is the -fix_nucleotides a Maker flag? What exactly does >>it do? Does it remove the entire sequence or replace ambiguous bases >>with a randomly selected one? Half of my 20k ESTs contain these >>characters, so I don't want to throw them out entirely. >> >>Also, just curious, has Maker never supported these characters but just >>never complained? I used this EST data set with Maker 2.09. I did note >>poor EST coverage, but thought it was an issue with the EST data itself. >> >>I appreciate any suggestions. >>Thanks, >>Megan_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Feb 24 13:59:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 13:59:12 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: I found the issue. You have non-ascii characters at the end of almost every line. Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works. Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions. I have looked at the gff file, but I do not >see any obvious errors. I have uploaded the files to your website. The >reference fasta is there, the full gff, and a single gene gff that also >causes an error. If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff. The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information. I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag). It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it. Why would > you want to do this? > For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed. You want a warning > so you can check first. If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those). This > can be from prior > manipulation of the GFF3 file. For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error). The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you. You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added). The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first. So it appears to be a systematic > error across the > >entire file. The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)". I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff. Attached is a gff > with the first 2 > >genes. > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV]. It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically". Is the -fix_nucleotides a Maker > flag? What exactly does > >it do? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained? I used this EST data set with > Maker 2.09. I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: purify Type: application/octet-stream Size: 1966 bytes Desc: not available URL: From carsonhh at gmail.com Mon Feb 24 14:03:00 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 24 Feb 2014 14:03:00 -0700 Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: References: <1393273971.41635.YahooMailBasic@web162205.mail.bf1.yahoo.com> Message-ID: One more thing. You must give the file to pred_gff or model_gff. It is no longer strictly a MAKER file, as many of the source columns read ?.? meaning it has been edited by Apollo or another editor. So it will not be guaranteed to be recognized by genome_gff, because many of the source tags have changed. Thanks, Carson On 2/24/14, 1:59 PM, "Carson Holt" wrote: >I found the issue. You have non-ascii characters at the end of almost >every line. Because they are happening within the Parent= tag, they then >become part of the Parent ID when the file is read. > >So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent >ID. > >?\cM? is a meta-return. > >I ran the attached script to remove these characters (perl purify >), and then it works. Make sure to remove the >.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file >to force the GFF3 database to be rebuilt after fixing the file when you >rerun MAKER. > >Thanks, >Carson > > > > >On 2/24/14, 1:32 PM, "Megan" wrote: > >>Hi Carson and Daniel, >> >>Thanks for your suggestions. I have looked at the gff file, but I do not >>see any obvious errors. I have uploaded the files to your website. The >>reference fasta is there, the full gff, and a single gene gff that also >>causes an error. If I remove that gene from the full gff, then the error >>is on the next gene in the file, so it appears to be a systematic problem >>throughout the gff. The gff was generated by Maker, but I may have >>messed it up when I modified it to rename genes and add functional >>information. I checked with cat -te, but don't see any obvious >>formatting errors. >> >>Thanks! >>Megan >> >> >>-------------------------------------------- >>On Mon, 2/24/14, Carson Holt wrote: >> >> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >>nucleotides >> To: "Megan" , maker-devel at yandell-lab.org >> Date: Monday, February 24, 2014, 10:18 AM >> >> The -fix_nucleotides flag is added to >> the command line (I.e. maker >> -fix_nucleotides flag). It is there so you are aware >> that there is an >> issue with your fasta file, that will cause things >> downstream to fail. >> MAKER can fix the errors for you, but first it gives a >> warning designed to >> make you look at the file and validate it. Why would >> you want to do this? >> For example, what if you provided protein sequence to the >> EST option >> accidentally, you wouldn?t want MAKER to just >> proceed. You want a warning >> so you can check first. If your file is in fact EST >> data, then set the >> flag and those characters will be changed to N?s in the >> fixed fasta >> sequence, otherwise those characters will cause errors in >> downstream tools >> like exonerate, and even some downstream GMOD tools, so they >> can?t be >> allowed to remain as is. >> >> For the GFF3 file, there is almost definitely a logic issue >> in the file >> (mod encode validator won?t check for those). This >> can be from prior >> manipulation of the GFF3 file. For example, IDs for a >> gene that are the >> same across two contigs (technically valid but a logic >> error). The GFF3 >> error message will normally give the ID of the feature >> causing the issue. >> >> I could also take a look for you. You can upload the >> GFF3 file here ?> >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi >> Click on 'new guest account' then e-mail me back you guest >> ID, so I know >> which files to review. >> >> Thanks, >> Carson >> >> >> >> On 2/24/14, 12:02 AM, "Megan" >> wrote: >> >> >Maker folks, >> >I am re-annotating a single contig and I am having a few >> problems. >> > >> >First, I am having trouble passing through a Maker >> derived gff (from >> >Maker 2.09, with some modifications to gene names and >> functional >> >information added). The gff file passes the >> modencode validator but >> >Maker always fails on the first gene in the file, >> regardless of which >> >gene comes first. So it appears to be a systematic >> error across the >> >entire file. The Maker error is "Check your input >> GFF3 file for errors! >> >(from GFFDB)". I have tried Maker 2.10 >> and 2.31, using both genome_gff >> >with model_pass=1 and pred_gff. Attached is a gff >> with the first 2 >> >genes. >> > >> >Second, when I updated to Maker 2.31, Maker now >> complains that my EST >> >fasta file has nucleotides that are not supported >> [RYKMSWBDHV]. It >> >suggests "set -fix_nucleotides on the command line to >> fix this >> >automatically". Is the -fix_nucleotides a Maker >> flag? What exactly does >> >it do? Does it remove the entire sequence or >> replace ambiguous bases >> >with a randomly selected one? Half of my 20k ESTs >> contain these >> >characters, so I don't want to throw them out entirely. >> > >> >Also, just curious, has Maker never supported these >> characters but just >> >never complained? I used this EST data set with >> Maker 2.09. I did note >> >poor EST coverage, but thought it was an issue with the >> EST data itself. >> > >> >I appreciate any suggestions. >> >Thanks, >> >Megan_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > From rbharris at uw.edu Tue Feb 25 14:49:57 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Tue, 25 Feb 2014 13:49:57 -0800 Subject: [maker-devel] error in snap training Message-ID: Hey - I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. Thanks! Rebecca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 15:12:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 15:12:14 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: Message-ID: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Make sure you are using 2.31, and then try the maker2zff filters individually. If the protein models are not working well, use CEGMA to generate models. It's from the same group as SNAP. Use cegma2zff for the conversion. --Carson Sent from my iPhone > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > Hey - > > I'm trying to train SNAP and am running into errors. I don't have any EST evidence, just protein. My .gff file reports 10865 genes but when I run maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a ton of overlap_prev_exon errors get written to the screen and then with I get to the forge step I get an "impossible error5". Any help would be greatly appreciated. > > Thanks! > Rebecca > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sjackman at gmail.com Tue Feb 25 17:06:03 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Tue, 25 Feb 2014 16:06:03 -0800 Subject: [maker-devel] Mapping gene names Message-ID: Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the *map_forward* option, which applies to the *model_gff* parameter. Is there a similar option for *est* and *protein*? *maker_opts.ctl* est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun -------------- next part -------------- An HTML attachment was scrubbed... URL: From hedgyx at yahoo.com Tue Feb 25 17:26:11 2014 From: hedgyx at yahoo.com (Megan) Date: Tue, 25 Feb 2014 16:26:11 -0800 (PST) Subject: [maker-devel] gff pass thru problem and unsupported EST nucleotides In-Reply-To: Message-ID: <1393374371.45210.YahooMailBasic@web162201.mail.bf1.yahoo.com> Carson, Everything ran through smoothly after removing the ^Ms. Thanks for the help. Megan -------------------------------------------- On Mon, 2/24/14, Carson Holt wrote: Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides To: "Megan" , "Daniel Ence" Cc: "maker-devel at yandell-lab.org" Date: Monday, February 24, 2014, 12:59 PM I found the issue.? You have non-ascii characters at the end of almost every line.? Because they are happening within the Parent= tag, they then become part of the Parent ID when the file is read. So instead of "HERA000031-RA? you get ?> "HERA000031-RA\cM? as the Parent ID. ?\cM? is a meta-return. I ran the attached script to remove these characters (perl purify ), and then it works.? Make sure to remove the .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file to force the GFF3 database to be rebuilt after fixing the file when you rerun MAKER. Thanks, Carson On 2/24/14, 1:32 PM, "Megan" wrote: >Hi Carson and Daniel, > >Thanks for your suggestions.? I have looked at the gff file, but I do not >see any obvious errors.? I have uploaded the files to your website.? The >reference fasta is there, the full gff, and a single gene gff that also >causes an error.? If I remove that gene from the full gff, then the error >is on the next gene in the file, so it appears to be a systematic problem >throughout the gff.? The gff was generated by Maker, but I may have >messed it up when I modified it to rename genes and add functional >information.? I checked with cat -te, but don't see any obvious >formatting errors. > >Thanks! >Megan > > >-------------------------------------------- >On Mon, 2/24/14, Carson Holt wrote: > > Subject: Re: [maker-devel] gff pass thru problem and unsupported EST >nucleotides > To: "Megan" , maker-devel at yandell-lab.org > Date: Monday, February 24, 2014, 10:18 AM > > The -fix_nucleotides flag is added to > the command line (I.e. maker > -fix_nucleotides flag).? It is there so you are aware > that there is an > issue with your fasta file, that will cause things > downstream to fail. > MAKER can fix the errors for you, but first it gives a > warning designed to > make you look at the file and validate it.? Why would > you want to do this? >? For example, what if you provided protein sequence to the > EST option > accidentally, you wouldn?t want MAKER to just > proceed.? You want a warning > so you can check first.? If your file is in fact EST > data, then set the > flag and those characters will be changed to N?s in the > fixed fasta > sequence, otherwise those characters will cause errors in > downstream tools > like exonerate, and even some downstream GMOD tools, so they > can?t be > allowed to remain as is. > > For the GFF3 file, there is almost definitely a logic issue > in the file > (mod encode validator won?t check for those).? This > can be from prior > manipulation of the GFF3 file.? For example, IDs for a > gene that are the > same across two contigs (technically valid but a logic > error).? The GFF3 > error message will normally give the ID of the feature > causing the issue. > > I could also take a look for you.? You can upload the > GFF3 file here ?> > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi > Click on 'new guest account' then e-mail me back you guest > ID, so I know > which files to review. > > Thanks, > Carson > > > > On 2/24/14, 12:02 AM, "Megan" > wrote: > > >Maker folks, > >I am re-annotating a single contig and I am having a few > problems. > > > >First, I am having trouble passing through a Maker > derived gff (from > >Maker 2.09, with some modifications to gene names and > functional > >information added).? The gff file passes the > modencode validator but > >Maker always fails on the first gene in the file, > regardless of which > >gene comes first.? So it appears to be a systematic > error across the > >entire file.? The Maker error is "Check your input > GFF3 file for errors! > >(from GFFDB)".???I have tried Maker 2.10 > and 2.31, using both genome_gff > >with model_pass=1 and pred_gff.? Attached is a gff > with the first 2 > >genes.? > > > >Second, when I updated to Maker 2.31, Maker now > complains that my EST > >fasta file has nucleotides that are not supported > [RYKMSWBDHV].? It > >suggests "set -fix_nucleotides on the command line to > fix this > >automatically".? Is the -fix_nucleotides a Maker > flag?? What exactly does > >it do?? Does it remove the entire sequence or > replace ambiguous bases > >with a randomly selected one?? Half of my 20k ESTs > contain these > >characters, so I don't want to throw them out entirely. > > > >Also, just curious, has Maker never supported these > characters but just > >never complained?? I used this EST data set with > Maker 2.09.? I did note > >poor EST coverage, but thought it was an issue with the > EST data itself. > > > >I appreciate any suggestions. > >Thanks, > >Megan_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Feb 25 17:58:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 17:58:08 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 18:04:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 18:04:48 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: One more note. When using this option, the score column of mRNA features will represent how completely this gene matches the source EST/protein (fraction coverage multiplied by % identity). So a value of 100 means there is perfect match. This way if the same transcript maps to multiple locations, then you can identify which locations is the closest match (also works for identifying likly orthologs vs. paralogs). ?Carson From: Carson Holt Date: Tuesday, February 25, 2014 at 5:58 PM To: Shaun Jackman , Subject: Re: [maker-devel] Mapping gene names There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 18:36:21 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 17:36:21 -0800 Subject: [maker-devel] invalid gff3 format issues Message-ID: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Feb 25 19:02:04 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 26 Feb 2014 02:02:04 +0000 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Walter, Will you upload the full GFF3 and the control files that you used to this URL? http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 Also, what version of MAKER are you running this with? Thanks, Daniel On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: Hi all, I am trying to update maker annotations with PASA and encountered errors stemming from file format issues in the gff3 file. I put a few lines from the gff3 to highlight the issue below. Basically, the problem is that there are non-unique IDs for a number of the annotations. Is there anything that can be done to right this problem? Thanks, Walter Lines from GFF3 file, repeated IDs are highlighted: chr1 maker gene 9377440 9432028 . - . ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 chr1 maker mRNA 9377440 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 chr1 maker exon 9431899 9432028 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker exon 9431698 9431808 . - . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 chr1 maker gene 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 chr1 maker exon 8894975 8895153 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 chr1 maker exon 8942215 8942531 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From weckalba at asu.edu Tue Feb 25 19:11:12 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Tue, 25 Feb 2014 18:11:12 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Daniel, those have been uploaded and I'm using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to > this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > > Hi all, > > I am trying to update maker annotations with PASA and encountered errors > stemming from file format issues in the gff3 file. > > I put a few lines from the gff3 to highlight the issue below. Basically, > the problem is that there are non-unique IDs for a number of the > annotations. > > Is there anything that can be done to right this problem? > > Thanks, > > Walter > > Lines from GFF3 file, repeated IDs are highlighted: > > > chr1 maker gene 9377440 9432028 . - . > ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 > chr1 maker mRNA 9377440 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 > chr1 maker exon 9431899 9432028 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > chr1 maker exon 9431698 9431808 . - . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 > > chr1 maker gene 8894975 9021577 . + . > ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 > chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; > Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 > chr1 maker exon 8894975 8895153 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > chr1 maker exon 8942215 8942531 . + . > ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 25 21:10:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 25 Feb 2014 21:10:27 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you try version 2.31 (the current version)? I believe this is happening because you are passing in MAKER genes as pred_gff the transcripts thus ended up with the same Names and IDs as the genes being generated by the MAKER run via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in 2.31 (IDs and names are generated slightly differently in 2.30+). Thanks, Carson From: Walter Eckalbar Date: Tuesday, February 25, 2014 at 7:11 PM To: Daniel Ence Cc: "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Daniel, those have been uploaded and I?m using version 2.28. Walter On 25 February 2014 18:02, Daniel Ence wrote: > Hi Walter, > > Will you upload the full GFF3 and the control files that you used to this URL? > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 > Also, what version of MAKER are you running this with? > > Thanks, > Daniel > > > > On Feb 25, 2014, at 6:36 PM, Walter Eckalbar > wrote: > >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, the >> problem is that there are non-unique IDs for a number of the annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4. >> 16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4.1 >> 6;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82 >> |1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=mak >> er-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0 >> .88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53 >> -mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,mak >> er-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-sna >> p-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53 >> -mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,ma >> ker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Wed Feb 26 01:26:35 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 26 Feb 2014 08:26:35 +0000 Subject: [maker-devel] Functional annotation options Message-ID: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? 2) maker_functional_gff This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. 3) maker_functional This just throws an error about a missing Job ID, so no clue what this would be used for. I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. With kind regards, Marc Hoeppner Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se From mikael.durling at slu.se Wed Feb 26 02:43:43 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:43:43 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> Message-ID: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> 26 feb 2014 kl. 09:26 skrev Marc H?ppner : > Dear List, > > I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: > > 1) iprscan > It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. > 2) maker_functional_gff > This script seems to be very useful, but the description suggests that it requires WuBlast tabular output ?2', which I think looks quite different from the ncbi blast tabular output. Since Wublast is not really available anymore (except this very old, frozen binary bundle), I was wondering how to address this issue. It works fine with ncbiblast+ and the blastp command with -outfmt 6. cheers, Mikael Ps. Your welcome to visit me at SLU if you would like to discuss experiences of genome annotations. > > 3) maker_functional > This just throws an error about a missing Job ID, so no clue what this would be used for. > > I guess what I am after is some suggestion as to how use the scripts included with Maker to achieve a reasonable functional annotation. > > With kind regards, > > Marc Hoeppner > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mikael.durling at slu.se Wed Feb 26 02:55:56 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 09:55:56 +0000 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: <29357689-D616-465F-BCC4-66AF5B1D5D2E@slu.se> 26 feb 2014 kl. 10:43 skrev Mikael Brandstr?m Durling >: 26 feb 2014 kl. 09:26 skrev Marc H?ppner >: Dear List, I have finished a gene build now, and I would like to go over to functional annotation. I understand that maker includes a few script to facilitate such analyses. However, I have a few questions about this: 1) iprscan It seems maker includes a MPI wrapper for InterProscan, but requests ?iprscan? to be in $PATH. The latest versions of Interproscan I have worked with are java applications and eventho I put their location in $PATH, mpi_iprscan seems to want something else? But what? I don?t believe it works with interproscan5. What I usually do is to split the maker protein file into chunks, and then run these chunks as separate jobs on our cluster, then finally merge the results. The TSV file form iprscan5 can be input into the maker tool ipr_update_gff. I have not tried the iprscan2gff3, as I haven?t figured how to get an iprscan4 raw file from iprscan5. I should clarify this and say that mpi_iprscan doesn?t seem to work with iprscan5. ipr_update_gff3 does, however. Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 05:30:44 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 12:30:44 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 06:22:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 06:22:34 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: Message-ID: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: > > Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >> >> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >> >> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >> >> ?Carson >> >> >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, February 25, 2014 at 5:06 PM >> To: >> Subject: [maker-devel] Mapping gene names >> >> Hi, >> >> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >> >> maker_opts.ctl >> >> est=NC_123456.frn >> protein=NC_123456.faa >> est2genome=1 >> protein2genome=1 >> Thanks, >> Shaun >> >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 06:37:29 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 13:37:29 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> Message-ID: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nextgen.usfs at gmail.com Wed Feb 26 09:21:33 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:21:33 -0600 Subject: [maker-devel] change program locations in maker_exe Message-ID: Hello, I was wondering if there is a way to make permanent changes to the maker_exe.ctl file, as it seems on the install that maker didn?t find the gene mark or pro build locations correctly, which means that I have to manually edit the maker_exe.ctl file every time and add that information. Where can I modify this permanently so that the maker -CTL command creates the appropriate maker_exe file? Thank you. - Jon From carsonhh at gmail.com Wed Feb 26 08:38:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 08:38:47 -0700 Subject: [maker-devel] Functional annotation options In-Reply-To: <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> References: <08C5680E-0385-4AB4-9497-5349D7CA0501@imbim.uu.se> <63EF1C00-9495-4401-BF75-5C1347C1ABB3@slu.se> Message-ID: maker_functional is a script that gets called by another script, not meant to be called directly by the user. So ignore that. Just run iprscan directly it already works pretty well. The mpi_iprscan and iprscan_wrap scripts, just give some logging functionality by wrapping the iprscan call. In most cases there is not advantage over just running iprscan directly. ?Carson On 2/26/14, 2:43 AM, "Mikael Brandstr?m Durling" wrote: > >26 feb 2014 kl. 09:26 skrev Marc H?ppner : > >> Dear List, >> >> I have finished a gene build now, and I would like to go over to >>functional annotation. I understand that maker includes a few script to >>facilitate such analyses. However, I have a few questions about this: >> >> 1) iprscan >> It seems maker includes a MPI wrapper for InterProscan, but requests >>?iprscan? to be in $PATH. The latest versions of Interproscan I have >>worked with are java applications and eventho I put their location in >>$PATH, mpi_iprscan seems to want something else? But what? > >I don?t believe it works with interproscan5. What I usually do is to >split the maker protein file into chunks, and then run these chunks as >separate jobs on our cluster, then finally merge the results. The TSV >file form iprscan5 can be input into the maker tool ipr_update_gff. I >have not tried the iprscan2gff3, as I haven?t figured how to get an >iprscan4 raw file from iprscan5. > > >> 2) maker_functional_gff >> This script seems to be very useful, but the description suggests that >>it requires WuBlast tabular output ?2', which I think looks quite >>different from the ncbi blast tabular output. Since Wublast is not >>really available anymore (except this very old, frozen binary bundle), I >>was wondering how to address this issue. > >It works fine with ncbiblast+ and the blastp command with -outfmt 6. > >cheers, >Mikael > >Ps. Your welcome to visit me at SLU if you would like to discuss >experiences of genome annotations. > > >> >> 3) maker_functional >> This just throws an error about a missing Job ID, so no clue what this >>would be used for. >> >> I guess what I am after is some suggestion as to how use the scripts >>included with Maker to achieve a reasonable functional annotation. >> >> With kind regards, >> >> Marc Hoeppner >> >> Marc P. Hoeppner, PhD >> Team Leader >> BILS Genome Annotation Platform >> Department for Medical Biochemistry and Microbiology >> Uppsala University, Sweden >> marc.hoeppner at imbim.uu.se >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 09:09:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 09:09:14 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: > >> Can this use of maker_coor be used only to hint about the placement of the >> ests, without affecting the naming of the final genes? Ie if I have a >> database of EST where I have a priori knowledge of their rough placement, can >> this placement be given to maker without providing est_forward=1? >> >> Thanks, >> Mikael >> >> 26 feb 2014 kl. 01:58 skrev Carson Holt : >> >>> There is a way. It?s not a standard option and it?s undocumented, but if >>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that. >>> The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add tags >>> to your fasta headers, those can be used to guide the mapping and naming. >>> For example, gene_id= will ensure different isoforms that share >>> a common gene_id get clustered into the same gene, and >>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence >>> to only be mapped against chr1 within the range of 1-10000 bp and just >>> using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using blast >>> alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, using >>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate >>> my genome. I?ve run Maker, and the annotation seems to have worked well. Is >>> it possible to map the names of the genes from the related species to my >>> annotation? I see the map_forward option, which applies to the model_gff >>> parameter. Is there a similar option for est and protein? >>> >>> maker_opts.ctl >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Feb 26 09:38:37 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 26 Feb 2014 16:38:37 +0000 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: MAKER first looks inside of .../maker/exe/ for any executables. Then it uses the systems ?which? command to identify executables in your PATH environmental variable. If MAKER is not finding the one you want, then you can either put the program in the .../maker/exe/ folder (I.e. create .../maker/exe/bin/ and then put soft links to the executables you want to be used first), or you can rearrange the order of paraameters in your PATH environmental variable so that ?which ? returns the location you want. If MAKER is always leaving the locations to those programs empty, it is because you need to add them to your PATH environmental variable. Thanks, Carson On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: >Hello, >I was wondering if there is a way to make permanent changes to the >maker_exe.ctl file, as it seems on the install that maker didn?t find the >gene mark or pro build locations correctly, which means that I have to >manually edit the maker_exe.ctl file every time and add that information. > Where can I modify this permanently so that the maker -CTL command >creates the appropriate maker_exe file? Thank you. > >- Jon > > From nextgen.usfs at gmail.com Wed Feb 26 09:58:11 2014 From: nextgen.usfs at gmail.com (USFS Ion PGM) Date: Wed, 26 Feb 2014 10:58:11 -0600 Subject: [maker-devel] change program locations in maker_exe In-Reply-To: References: Message-ID: <2FA61AAE-0548-4030-9F4A-6964A631703C@gmail.com> Hi Carson, Thank you - that did it, I didn?t have them in the PATH. All working now. Cheers, Jon On Feb 26, 2014, at 10:38 AM, Carson Holt wrote: > MAKER first looks inside of .../maker/exe/ for any executables. Then it > uses the systems ?which? command to identify executables in your PATH > environmental variable. If MAKER is not finding the one you want, then > you can either put the program in the .../maker/exe/ folder (I.e. create > .../maker/exe/bin/ and then put soft links to the executables you want to > be used first), or you can rearrange the order of paraameters in your PATH > environmental variable so that ?which ? returns the location > you want. If MAKER is always leaving the locations to those programs > empty, it is because you need to add them to your PATH environmental > variable. > > Thanks, > Carson > > On 2/26/14, 9:21 AM, "USFS Ion PGM" wrote: > >> Hello, >> I was wondering if there is a way to make permanent changes to the >> maker_exe.ctl file, as it seems on the install that maker didn?t find the >> gene mark or pro build locations correctly, which means that I have to >> manually edit the maker_exe.ctl file every time and add that information. >> Where can I modify this permanently so that the maker -CTL command >> creates the appropriate maker_exe file? Thank you. >> >> - Jon >> >> > From weckalba at asu.edu Wed Feb 26 13:05:05 2014 From: weckalba at asu.edu (Walter Eckalbar) Date: Wed, 26 Feb 2014 12:05:05 -0800 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA's gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is > happening because you are passing in MAKER genes as pred_gff the > transcripts thus ended up with the same Names and IDs as the genes being > generated by the MAKER run via SNAP etc. This shouldn't happen with > model_gff, and shouldn't happen in 2.31 (IDs and names are generated > slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I'm using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: > >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >> Hi all, >> >> I am trying to update maker annotations with PASA and encountered errors >> stemming from file format issues in the gff3 file. >> >> I put a few lines from the gff3 to highlight the issue below. Basically, >> the problem is that there are non-unique IDs for a number of the >> annotations. >> >> Is there anything that can be done to right this problem? >> >> Thanks, >> >> Walter >> >> Lines from GFF3 file, repeated IDs are highlighted: >> >> >> chr1 maker gene 9377440 9432028 . - . >> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4.16 >> chr1 maker mRNA 9377440 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0.82|1|1|1|28|1680|1234 >> chr1 maker exon 9431899 9432028 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> chr1 maker exon 9431698 9431808 . - . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.53-mRNA-1 >> >> chr1 maker gene 8894975 9021577 . + . >> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >> chr1 maker mRNA 8894975 9021577 . + . ID=maker-chr1-snap-gene-4.53-mRNA-1; >> Parent=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84|0.88|27|503|2007 >> chr1 maker exon 8894975 8895153 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> chr1 maker exon 8942215 8942531 . + . >> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.53-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,maker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1-snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene-4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA-10,maker-chr1-snap-gene-4.53-mRNA-11 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 14:12:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 14:12:23 -0700 Subject: [maker-devel] invalid gff3 format issues In-Reply-To: References: Message-ID: Could you put the file in this GFF3 validator to see if anything comes up? ?> http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Maybe it?s just PASA. But I?d like to know there?s no issue being caused by something else. Thanks, Carson From: Walter Eckalbar Date: Wednesday, February 26, 2014 at 1:05 PM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] invalid gff3 format issues Hi Carson, Thanks, that seems to have mostly resolved the issue. Oddly enough though, PASA still complains about the GFF3 file directly from gff3_merge, but if I first transform it with maker2eval_gtf, then use PASA?s gtf_to_gff3_format.pl script, everything seems to run fine. On 25 February 2014 20:10, Carson Holt wrote: > Could you try version 2.31 (the current version)? I believe this is happening > because you are passing in MAKER genes as pred_gff the transcripts thus ended > up with the same Names and IDs as the genes being generated by the MAKER run > via SNAP etc. This shouldn?t happen with model_gff, and shouldn?t happen in > 2.31 (IDs and names are generated slightly differently in 2.30+). > > Thanks, > Carson > > From: Walter Eckalbar > Date: Tuesday, February 25, 2014 at 7:11 PM > To: Daniel Ence > Cc: "" > Subject: Re: [maker-devel] invalid gff3 format issues > > Hi Daniel, those have been uploaded and I?m using version 2.28. > > Walter > > > On 25 February 2014 18:02, Daniel Ence wrote: >> Hi Walter, >> >> Will you upload the full GFF3 and the control files that you used to this >> URL? >> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi?guest_id=189 >> Also, what version of MAKER are you running this with? >> >> Thanks, >> Daniel >> >> >> >> On Feb 25, 2014, at 6:36 PM, Walter Eckalbar >> wrote: >> >>> Hi all, >>> >>> I am trying to update maker annotations with PASA and encountered errors >>> stemming from file format issues in the gff3 file. >>> >>> I put a few lines from the gff3 to highlight the issue below. Basically, >>> the problem is that there are non-unique IDs for a number of the >>> annotations. >>> >>> Is there anything that can be done to right this problem? >>> >>> Thanks, >>> >>> Walter >>> >>> Lines from GFF3 file, repeated IDs are highlighted: >>> >>> >>> chr1 maker gene 9377440 9432028 . - . >>> ID=maker-chr1-pred_gff_maker-gene-4.16;Name=maker-chr1-pred_gff_maker-gene-4 >>> .16 >>> chr1 maker mRNA 9377440 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-pred_gff_maker-gene-4. >>> 16;Name=maker-chr1-snap-gene-4.53-mRNA-1;_AED=0.17;_eAED=0.17;_QI=66|0.88|0. >>> 82|1|1|1|28|1680|1234 >>> chr1 maker exon 9431899 9432028 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:698;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> chr1 maker exon 9431698 9431808 . - . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:697;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1 >>> >>> chr1 maker gene 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53;Name=maker-chr1-snap-gene-4.53 >>> chr1 maker mRNA 8894975 9021577 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1;Parent=maker-chr1-snap-gene-4.53;Name=ma >>> ker-chr1-snap-gene-4.53-mRNA-1;_AED=0.16;_eAED=0.17;_QI=229|0.73|0.74|1|0.84 >>> |0.88|27|503|2007 >>> chr1 maker exon 8894975 8895153 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:558;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> chr1 maker exon 8942215 8942531 . + . >>> ID=maker-chr1-snap-gene-4.53-mRNA-1:exon:559;Parent=maker-chr1-snap-gene-4.5 >>> 3-mRNA-1,maker-chr1-snap-gene-4.53-mRNA-2,maker-chr1-snap-gene-4.53-mRNA-3,m >>> aker-chr1-snap-gene-4.53-mRNA-4,maker-chr1-snap-gene-4.53-mRNA-5,maker-chr1- >>> snap-gene-4.53-mRNA-6,maker-chr1-snap-gene-4.53-mRNA-7,maker-chr1-snap-gene- >>> 4.53-mRNA-8,maker-chr1-snap-gene-4.53-mRNA-9,maker-chr1-snap-gene-4.53-mRNA- >>> 10,maker-chr1-snap-gene-4.53-mRNA-11 >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Wed Feb 26 15:04:37 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Wed, 26 Feb 2014 22:04:37 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt >: It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt >: Yes. That should work as well as an accidental feature. --Carson Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling > wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt >: There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM To: > Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 15:50:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 15:50:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : > It will still work without est_forward. It just works a little differently. > Keep in mind this was a hidden feature I used to find stubborn or hard to find > missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and then > polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for > a match using all of chr1 as the input to exonerate even when BLAST finds > nothing (this is a very very slow search, but can help pick up one or two > stubborn genes that don?t remap well). To allow this, MAKER gives exonerate > looser matching parameters (i.e. allows for single base pair introns perhaps > caused by assembly errors). The logic here is that given the fact that I > already told MAKER that with some degree of confidence I expect sequence A to > map to to location X, it will try its hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at line > 1563, but only after a BLAST alignment has already seeded it to the region > (that BLAST result has the information in its description parameter). MAKER > will then ignore seeds completely outside of maker_coor. In addition any BLAST > seeds that overlap maker_coor will get the search space for alignment > polishing adjusted to match maker_coor exactly. Also match parameters for > exonerate will not be relaxed as they were with est_forward. > > As you can see the behavior, is slightly different (because it?s an accidental > feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading the > code, it seems that I need to supply maker_coor but not gene_id, as well as > the configuration option est_forward for this to work. Any occurrences of > maker_coor in GI.pm seems to be conditioned on set_forward=1 right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > >> Yes. That should work as well as an accidental feature. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >> wrote: >> >>> Can this use of maker_coor be used only to hint about the placement of the >>> ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>>> There is a way. It?s not a standard option and it?s undocumented, but if >>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add tags >>>> to your fasta headers, those can be used to guide the mapping and naming. >>>> For example, gene_id= will ensure different isoforms that share >>>> a common gene_id get clustered into the same gene, and >>>> maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using blast >>>> alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, using >>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the map_forward option, which applies to the >>>> model_gff parameter. Is there a similar option for est and protein? >>>> >>>> maker_opts.ctl >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 16:45:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2014 16:45:30 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson Sent from my iPhone > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. > > THanks, > Mikael > >> 26 feb 2014 kl. 17:09 skrev Carson Holt : >> >> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >> >> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >> >> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >> >> As you can see the behavior, is slightly different (because it?s an accidental feature). >> >> Thanks, >> Carson >> >> >> >> From: Mikael Brandstr?m Durling >> Date: Wednesday, February 26, 2014 at 6:37 AM >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >> >> Mikael >> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>> >>>>> maker_opts.ctl >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> Thanks, >>>>> Shaun >>>>> >>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 09:46:44 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 11:46:44 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband Message-ID: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- STATUS: Parsing control files... -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: compute-g20-7.deepthought.umd.edu (PID 28015) MPI_COMM_WORLD rank: 0 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [compute-g20-8:09542] *** Process received signal *** [compute-g20-8:09542] Signal: Segmentation fault (11) [compute-g20-8:09542] Signal code: Address not mapped (1) [compute-g20-8:09542] Failing at address: 0xee00350 [compute-g20-8:09543] *** Process received signal *** [compute-g20-8:09543] Signal: Segmentation fault (11) [compute-g20-8:09543] Signal code: Address not mapped (1) [compute-g20-8:09543] Failing at address: 0xf020c90 [compute-g20-8:09544] *** Process received signal *** [compute-g20-8:09544] Signal: Segmentation fault (11) [compute-g20-8:09544] Signal code: Address not mapped (1) [compute-g20-8:09544] Failing at address: 0x1ad68f10 [compute-g20-8:09545] *** Process received signal *** [compute-g20-8:09545] Signal: Segmentation fault (11) [compute-g20-8:09545] Signal code: Address not mapped (1) [compute-g20-8:09545] Failing at address: 0x84a3188 [compute-g20-8:09545] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0] [compute-g20-8:09545] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0x530) [0x2b98f9ea4ec0] [compute-g20-8:09545] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b98f9ea60ca] [compute-g20-8:09545] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602] [compute-g20-8:09545] [ 4] perl(Perl_savepvn+0x26) [0x4816b6] [compute-g20-8:09545] [ 5] perl(Perl_do_exec3+0x31e) [0x4f715e] [compute-g20-8:09545] [ 6] perl(Perl_my_popen+0x403) [0x484d63] [compute-g20-8:09545] [ 7] perl(Perl_do_openn+0x1696) [0x4f9536] [compute-g20-8:09545] [ 8] perl(Perl_pp_open+0x184) [0x4efc44] [compute-g20-8:09545] [ 9] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09545] [10] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09545] [11] perl(main+0x135) [0x41b485] [compute-g20-8:09545] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4] [compute-g20-8:09545] [13] perl [0x41b299] [compute-g20-8:09545] *** End of error message *** [compute-g20-8:09546] *** Process received signal *** [compute-g20-8:09546] Signal: Segmentation fault (11) [compute-g20-8:09546] Signal code: Address not mapped (1) [compute-g20-8:09546] Failing at address: 0x8240850 [compute-g20-8:09547] *** Process received signal *** [compute-g20-8:09547] Signal: Segmentation fault (11) [compute-g20-8:09547] Signal code: Address not mapped (1) [compute-g20-8:09547] Failing at address: 0xd5c8850 [compute-g20-8:09548] *** Process received signal *** [compute-g20-8:09548] Signal: Segmentation fault (11) [compute-g20-8:09548] Signal code: Address not mapped (1) [compute-g20-8:09548] Failing at address: 0x8c80850 [compute-g20-8:09549] *** Process received signal *** [compute-g20-8:09549] Signal: Segmentation fault (11) [compute-g20-8:09549] Signal code: Address not mapped (1) [compute-g20-8:09549] Failing at address: 0x18d72850 [compute-g20-10:07087] *** Process received signal *** [compute-g20-10:07087] Signal: Segmentation fault (11) [compute-g20-10:07087] Signal code: Address not mapped (1) [compute-g20-10:07087] Failing at address: 0x6659f10 [compute-g20-10:07088] *** Process received signal *** [compute-g20-10:07088] Signal: Segmentation fault (11) [compute-g20-10:07088] Signal code: Address not mapped (1) [compute-g20-10:07088] Failing at address: 0x1fe3b5d0 [compute-g20-10:07089] *** Process received signal *** [compute-g20-10:07089] Signal: Segmentation fault (11) [compute-g20-10:07089] Signal code: Address not mapped (1) [compute-g20-10:07089] Failing at address: 0x9870350 [compute-g20-10:07090] *** Process received signal *** [compute-g20-10:07090] Signal: Segmentation fault (11) [compute-g20-10:07090] Signal code: Address not mapped (1) [compute-g20-10:07090] Failing at address: 0x17bad350 STATUS: Processing and indexing input FASTA files... [compute-g20-8:09567] *** Process received signal *** [compute-g20-8:09567] Signal: Segmentation fault (11) [compute-g20-8:09567] Signal code: Address not mapped (1) [compute-g20-8:09567] Failing at address: 0x1ad5aa10 [compute-g20-8:09567] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0] [compute-g20-8:09567] [ 1] /lib64/libc.so.6(strlen+0x30) [0x2b6de3f67f40] [compute-g20-8:09567] [ 2] perl(Perl_do_exec3+0x3a) [0x4f6e7a] [compute-g20-8:09567] [ 3] perl(Perl_my_popen+0x403) [0x484d63] [compute-g20-8:09567] [ 4] perl(Perl_do_openn+0x1696) [0x4f9536] [compute-g20-8:09567] [ 5] perl(Perl_pp_open+0x184) [0x4efc44] [compute-g20-8:09567] [ 6] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09567] [ 7] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09567] [ 8] perl(main+0x135) [0x41b485] [compute-g20-8:09567] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4] [compute-g20-8:09567] [10] perl [0x41b299] [compute-g20-8:09567] *** End of error message *** [compute-g20-7:28123] *** Process received signal *** [compute-g20-7:28123] Signal: Segmentation fault (11) [compute-g20-7:28123] Signal code: Address not mapped (1) [compute-g20-7:28123] Failing at address: 0x19ad9f10 STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore To access files for individual sequences use the datastore index: /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log [compute-g20-10:07107] *** Process received signal *** [compute-g20-10:07107] Signal: Segmentation fault (11) [compute-g20-10:07107] Signal code: Address not mapped (1) [compute-g20-10:07107] Failing at address: 0x9870362 [compute-g20-10:07107] [ 0] /lib64/libpthread.so.0 [0x2b50c5c8cca0] [compute-g20-10:07107] [ 1] perl [0x487218] [compute-g20-10:07107] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-10:07107] [ 3] perl [0x49d9dc] [compute-g20-10:07107] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-10:07107] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07107] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-10:07107] [ 7] perl(main+0x135) [0x41b485] [compute-g20-10:07107] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b50c5eb79c4] [compute-g20-10:07107] [ 9] perl [0x41b299] [compute-g20-10:07107] *** End of error message *** examining contents of the fasta file and run log examining contents of the fasta file and run log [compute-g20-10:07108] *** Process received signal *** [compute-g20-10:07108] Signal: Segmentation fault (11) [compute-g20-10:07108] Signal code: Address not mapped (1) [compute-g20-10:07108] Failing at address: 0x1fe3b5c8 examining contents of the fasta file and run log [compute-g20-10:07108] [ 0] /lib64/libpthread.so.0 [0x2b88f6f8dca0] [compute-g20-10:07108] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b88f61d55b2] [compute-g20-10:07108] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b88f7210ad1] [compute-g20-10:07108] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919] [compute-g20-10:07108] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19] [compute-g20-10:07108] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07108] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-10:07108] [ 7] perl(main+0x135) [0x41b485] [compute-g20-10:07108] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b88f71b89c4] [compute-g20-10:07108] [ 9] perl [0x41b299] [compute-g20-10:07108] *** End of error message *** examining contents of the fasta file and run log [compute-g20-10:07109] *** Process received signal *** [compute-g20-10:07109] Signal: Segmentation fault (11) [compute-g20-10:07109] Signal code: Address not mapped (1) [compute-g20-10:07109] Failing at address: 0x6664ad0 [compute-g20-10:07109] [ 0] /lib64/libpthread.so.0 [0x2b0809664ca0] [compute-g20-10:07109] [ 1] /lib64/libc.so.6 [0x2b08098edada] [compute-g20-10:07109] [ 2] /lib64/libc.so.6(memmove+0x75) [0x2b08098ec095] [compute-g20-10:07109] [ 3] perl(Perl_sv_setpvn+0x7a) [0x4b775a] [compute-g20-10:07109] [ 4] perl(Perl_pp_concat+0xc9) [0x4a5739] [compute-g20-10:07109] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07109] [ 6] perl(Perl_call_sv+0x160) [0x4333a0] [compute-g20-10:07109] [ 7] perl(Perl_magic_methcall+0x182) [0x488c22] [compute-g20-10:07109] [ 8] perl(Perl_magic_setpack+0x52) [0x489292] [compute-g20-10:07109] [ 9] perl(Perl_mg_set+0x66) [0x48aca6] [compute-g20-10:07109] [10] perl(Perl_pp_sassign+0x19c) [0x4a5c8c] [compute-g20-10:07109] [11] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-10:07109] [12] perl(perl_run+0x243) [0x4340f3] [compute-g20-10:07109] [13] perl(main+0x135) [0x41b485] [compute-g20-10:07109] [14] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b080988f9c4] [compute-g20-10:07109] [15] perl [0x41b299] [compute-g20-10:07109] *** End of error message *** examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log --Next Contig-- --Next Contig-- --Next Contig-- examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Processing run.log file... examining contents of the fasta file and run log Processing run.log file... Processing run.log file... --Next Contig-- --Next Contig-- --Next Contig-- --Next Contig-- --Next Contig-- Processing run.log file... Processing run.log file... --Next Contig-- --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_2 Length: 2857 #--------------------------------------------------------------------- Processing run.log file... MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb.out did not finish on the last run and must be erased Processing run.log file... setting up GFF3 output and fasta chunks Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_7 Length: 972 #--------------------------------------------------------------------- [compute-g20-8:09576] *** Process received signal *** [compute-g20-8:09576] Signal: Segmentation fault (11) [compute-g20-8:09576] Signal code: Address not mapped (1) [compute-g20-8:09576] Failing at address: 0x1ad68f08 examining contents of the fasta file and run log #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_3 Length: 2316 #--------------------------------------------------------------------- [compute-g20-8:09576] [ 0] /lib64/libpthread.so.0 [0x2b6de3ce1ca0] [compute-g20-8:09576] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b6de2f295b2] [compute-g20-8:09576] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b6de3f64ad1] [compute-g20-8:09576] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919] [compute-g20-8:09576] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19] [compute-g20-8:09576] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09576] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09576] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09576] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b6de3f0c9c4] [compute-g20-8:09576] [ 9] perl [0x41b299] [compute-g20-8:09576] *** End of error message *** #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_4 Length: 1230 #--------------------------------------------------------------------- examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log examining contents of the fasta file and run log [compute-g20-8:09578] *** Process received signal *** [compute-g20-8:09578] Signal: Segmentation fault (11) [compute-g20-8:09578] Signal code: Address not mapped (1) [compute-g20-8:09578] Failing at address: 0xee0af18 [compute-g20-8:09578] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0] [compute-g20-8:09578] [ 1] perl(Perl_av_fetch+0x5b) [0x49cf8b] [compute-g20-8:09578] [ 2] perl(Perl_pp_aelem+0x26e) [0x49e48e] [compute-g20-8:09578] [ 3] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09578] [ 4] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09578] [ 5] perl(main+0x135) [0x41b485] [compute-g20-8:09578] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b03d08629c4] [compute-g20-8:09578] [ 7] perl [0x41b299] [compute-g20-8:09578] *** End of error message *** setting up GFF3 output and fasta chunks Processing run.log file... [compute-g20-8:09583] *** Process received signal *** [compute-g20-8:09583] Signal: Segmentation fault (11) [compute-g20-8:09583] Signal code: Address not mapped (1) [compute-g20-8:09583] Failing at address: 0x822b0e2 [compute-g20-8:09582] *** Process received signal *** [compute-g20-8:09582] Signal: Segmentation fault (11) [compute-g20-8:09582] Signal code: Address not mapped (1) [compute-g20-8:09582] Failing at address: 0x8c6b0e2 [compute-g20-8:09583] [ 0] /lib64/libpthread.so.0 [0x2ab7f114dca0] [compute-g20-8:09583] [ 1] perl [0x487218] [compute-g20-8:09583] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-8:09583] [ 3] perl [0x49d9dc] [compute-g20-8:09583] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-8:09583] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09583] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09583] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09583] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ab7f13789c4] [compute-g20-8:09583] [ 9] perl [0x41b299] [compute-g20-8:09583] *** End of error message *** [compute-g20-8:09582] [ 0] /lib64/libpthread.so.0 [0x2b4eace23ca0] [compute-g20-8:09582] [ 1] perl [0x487218] [compute-g20-8:09582] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-8:09582] [ 3] perl [0x49d9dc] [compute-g20-8:09582] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-8:09582] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09582] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09582] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09582] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b4ead04e9c4] [compute-g20-8:09582] [ 9] perl [0x41b299] [compute-g20-8:09582] *** End of error message *** examining contents of the fasta file and run log [compute-g20-8:09581] *** Process received signal *** [compute-g20-8:09581] Signal: Segmentation fault (11) [compute-g20-8:09581] Signal code: Address not mapped (1) [compute-g20-8:09581] Failing at address: 0x848da08 #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_17 Length: 1413 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_13 Length: 2019 #--------------------------------------------------------------------- [compute-g20-8:09581] [ 0] /lib64/libpthread.so.0 [0x2b98fac5eca0] [compute-g20-8:09581] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_free+0x22) [0x2b98f9ea65b2] [compute-g20-8:09581] [ 2] /lib64/libc.so.6(cfree+0xd1) [0x2b98faee1ad1] [compute-g20-8:09581] [ 3] perl(Perl_sv_setsv_flags+0xb49) [0x4ad919] [compute-g20-8:09581] [ 4] perl(Perl_pp_aassign+0x209) [0x4a3a19] [compute-g20-8:09581] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09581] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09581] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09581] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b98fae899c4] [compute-g20-8:09581] [ 9] perl [0x41b299] [compute-g20-8:09577] *** Process received signal *** [compute-g20-8:09581] *** End of error message *** [compute-g20-8:09577] Signal: Segmentation fault (11) [compute-g20-8:09577] Signal code: Address not mapped (1) [compute-g20-8:09577] Failing at address: 0xd5b30e2 [compute-g20-8:09577] [ 0] /lib64/libpthread.so.0 [0x2b79d382aca0] [compute-g20-8:09577] [ 1] perl [0x487218] [compute-g20-8:09577] [ 2] perl(Perl_hv_common+0xe67) [0x499dd7] [compute-g20-8:09577] [ 3] perl [0x49d9dc] [compute-g20-8:09577] [ 4] perl(Perl_pp_method_named+0x6e) [0x49dd4e] [compute-g20-8:09577] [ 5] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09577] [ 6] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09577] [ 7] perl(main+0x135) [0x41b485] [compute-g20-8:09577] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b79d3a559c4] [compute-g20-8:09577] [ 9] perl [0x41b299] [compute-g20-8:09577] *** End of error message *** #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_1 Length: 1446 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks [compute-g20-8:09579] *** Process received signal *** [compute-g20-8:09579] Signal: Segmentation fault (11) [compute-g20-8:09579] Signal code: Address not mapped (1) [compute-g20-8:09579] Failing at address: 0x18d64350 examining contents of the fasta file and run log [compute-g20-8:09579] [ 0] /lib64/libpthread.so.0 [0x2b31b670fca0] [compute-g20-8:09579] [ 1] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__ham_get_meta+0x4c) [0x2b31bbd1bccc] [compute-g20-8:09579] [ 2] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbd103fb] [compute-g20-8:09579] [ 3] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get+0x1fa) [0x2b31bbd81f3a] [compute-g20-8:09579] [ 4] /usr/local/BerkeleyDB/lib/libdb-4.7.so(__dbc_get_pp+0xb4) [0x2b31bbd8db04] [compute-g20-8:09579] [ 5] /usr/local/BerkeleyDB/lib/libdb-4.7.so [0x2b31bbce4b85] [compute-g20-8:09579] [ 6] /usr/local/perl/5.16.3-threaded/lib/site_perl/5.16.3/x86_64-linux-thread-multi/auto/DB_File/DB_File.so [0x2b31bbabafc9] [compute-g20-8:09579] [ 7] perl(Perl_pp_entersub+0x58f) [0x49ee4f] [compute-g20-8:09579] [ 8] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-8:09579] [ 9] perl(perl_run+0x243) [0x4340f3] [compute-g20-8:09579] [10] perl(main+0x135) [0x41b485] [compute-g20-8:09579] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b31b693a9c4] [compute-g20-8:09579] [12] perl [0x41b299] [compute-g20-8:09579] *** End of error message *** --Next Contig-- setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks --Next Contig-- setting up GFF3 output and fasta chunks --Next Contig-- --Next Contig-- Processing run.log file... MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/3B/F3/Gc_UCSC1_contig_26//theVoid.Gc_UCSC1_contig_26/0/Gc_UCSC1_contig_26.0.all.rb.out did not finish on the last run and must be erased --Next Contig-- --Next Contig-- --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_18 Length: 937 #--------------------------------------------------------------------- Processing run.log file... Processing run.log file... Processing run.log file... --Next Contig-- FATAL: Thread terminated, causing all processes to fail --> rank=17, hostname=compute-g20-10.deepthought.umd.edu setting up GFF3 output and fasta chunks Processing run.log file... Processing run.log file... #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_14 Length: 6745 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_9 Length: 554 #--------------------------------------------------------------------- MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/FB/E4/Gc_UCSC1_contig_22//theVoid.Gc_UCSC1_contig_22/0/Gc_UCSC1_contig_22.0.all.rb.out did not finish on the last run and must be erased setting up GFF3 output and fasta chunks Processing run.log file... setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_16 Length: 995 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_26 Length: 1895 #--------------------------------------------------------------------- FATAL: Thread terminated, causing all processes to fail --> rank=16, hostname=compute-g20-10.deepthought.umd.edu #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_23 Length: 618 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_31 Length: 506 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_28 Length: 5246 #--------------------------------------------------------------------- MAKER WARNING: The file UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/E5/53/Gc_UCSC1_contig_29//theVoid.Gc_UCSC1_contig_29/0/Gc_UCSC1_contig_29.0.all.rb.out did not finish on the last run and must be erased setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks setting up GFF3 output and fasta chunks #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_19 Length: 880 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_22 Length: 831 #--------------------------------------------------------------------- #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_21 Length: 12421 #--------------------------------------------------------------------- doing repeat masking FATAL: Thread terminated, causing all processes to fail --> rank=18, hostname=compute-g20-10.deepthought.umd.edu #--------------------------------------------------------------------- Now starting the contig!! SeqID: Gc_UCSC1_contig_29 Length: 1161 #--------------------------------------------------------------------- doing repeat masking DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 105. DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 106. DBD::SQLite::db selectcol_arrayref failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 108. DBD::SQLite::db do failed: disk I/O error at /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/bin/../lib/GFFDB.pm line 110. [compute-g20-7.deepthought.umd.edu:28014] 19 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork [compute-g20-7.deepthought.umd.edu:28014] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0/Gc_UCSC1_contig_9.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/0F/67/Gc_UCSC1_contig_9//theVoid.Gc_UCSC1_contig_9/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0/Gc_UCSC1_contig_17.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/D5/5A/Gc_UCSC1_contig_17//theVoid.Gc_UCSC1_contig_17/0 -pa 1 #-------------------------------# SIGTERM received SIGTERM received [compute-g20-7:28161] *** Process received signal *** [compute-g20-7:28161] Signal: Segmentation fault (11) [compute-g20-7:28161] Signal code: Address not mapped (1) [compute-g20-7:28161] Failing at address: 0x19a33ad0 [compute-g20-7:28161] [ 0] /lib64/libpthread.so.0 [0x2b9e1cd6bca0] [compute-g20-7:28161] [ 1] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_int_malloc+0xb0) [0x2b9e1bfb1a40] [compute-g20-7:28161] [ 2] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_memory_ptmalloc2_malloc+0x4a) [0x2b9e1bfb30ca] [compute-g20-7:28161] [ 3] perl(Perl_safesysmalloc+0x12) [0x481602] [compute-g20-7:28161] [ 4] perl(Perl_do_exec3+0x46) [0x4f6e86] [compute-g20-7:28161] [ 5] perl(Perl_my_popen+0x403) [0x484d63] [compute-g20-7:28161] [ 6] perl(Perl_pp_backtick+0xc2) [0x4f0752] [compute-g20-7:28161] [ 7] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-7:28161] [ 8] perl(Perl_call_sv+0x4d1) [0x433711] [compute-g20-7:28161] [ 9] perl(Perl_sighandler+0x208) [0x4876c8] [compute-g20-7:28161] [10] /lib64/libpthread.so.0 [0x2b9e1cd6bca0] [compute-g20-7:28161] [11] /usr/local/ofed/1.5.4/lib64/libmthca-rdmav2.so [0x2b9e29187bbc] [compute-g20-7:28161] [12] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b9e2686a8dd] [compute-g20-7:28161] [13] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(opal_progress+0x5b) [0x2b9e1bfc93cb] [compute-g20-7:28161] [14] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0x205) [0x2b9e25e22005] [compute-g20-7:28161] [15] /cell_root/software/openmpi/1.6/gnu/sys/lib/libmpi.so(PMPI_Recv+0x14f) [0x2b9e1bf2927f] [compute-g20-7:28161] [16] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so(_MPI_Recv+0x59) [0x2b9e23ba8d69] [compute-g20-7:28161] [17] /export/rel50_shadow/glue.umd.edu/software/maker/2.28/.amd64_rel50/perl/lib/auto/Parallel/Application/MPI/MPI.so [0x2b9e23ba8f58] [compute-g20-7:28161] [18] perl(Perl_pp_entersub+0x58f) [0x49ee4f] [compute-g20-7:28161] [19] perl(Perl_runops_standard+0xe) [0x49d5ce] [compute-g20-7:28161] [20] perl(perl_run+0x243) [0x4340f3] [compute-g20-7:28161] [21] perl(main+0x135) [0x41b485] [compute-g20-7:28161] [22] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b9e1cf969c4] [compute-g20-7:28161] [23] perl [0x41b299] [compute-g20-7:28161] *** End of error message *** running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0/Gc_UCSC1_contig_18.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/DC/D5/Gc_UCSC1_contig_18//theVoid.Gc_UCSC1_contig_18/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0/Gc_UCSC1_contig_16.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/BE/77/Gc_UCSC1_contig_16//theVoid.Gc_UCSC1_contig_16/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0/Gc_UCSC1_contig_14.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/1C/8A/Gc_UCSC1_contig_14//theVoid.Gc_UCSC1_contig_14/0 -pa 1 #-------------------------------# SIGTERM received doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0/Gc_UCSC1_contig_13.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/CB/E5/Gc_UCSC1_contig_13//theVoid.Gc_UCSC1_contig_13/0 -pa 1 #-------------------------------# SIGTERM received Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_amJ13c; /a/g20-fs1/software/dt-sw0/RepeatMasker/4.0.3/RepeatMasker /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0/Gc_UCSC1_contig_1.0.all.rb -species all -dir /export/lustre_1/imisner/Maker/UCSC1_CLC_de_novo.maker.output/UCSC1_CLC_de_novo_datastore/AA/A6/Gc_UCSC1_contig_1//theVoid.Gc_UCSC1_contig_1/0 -pa 1 #-------------------------------# SIGTERM received -------------------------------------------------------------------------- mpirun has exited due to process rank 17 with PID 7052 on node compute-g20-10.deepthought.umd.edu exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- SIGTERM received SIGTERM received Perl exited with active threads: 0 running and unjoined 1 finished and unjoined 0 running and detached FATAL: Thread terminated, causing all processes to fail --> rank=14, hostname=compute-g20-8.deepthought.umd.edu Perl exited with active threads: 0 running and unjoined 1 finished and unjoined 0 running and detached FATAL: Thread terminated, causing all processes to fail --> rank=12, hostname=compute-g20-8.deepthought.umd.edu [compute-g20-8:09470] *** Process received signal *** [compute-g20-8:09470] Signal: Segmentation fault (11) [compute-g20-8:09470] Signal code: Address not mapped (1) [compute-g20-8:09470] Failing at address: 0x4b0 [compute-g20-8:09470] [ 0] /lib64/libpthread.so.0 [0x2b03d0637ca0] [compute-g20-8:09470] [ 1] perl(Perl_csighandler+0x23) [0x488103] [compute-g20-8:09470] [ 2] /lib64/libpthread.so.0 [0x2b03d0637ca0] [compute-g20-8:09470] [ 3] /lib64/libc.so.6(__select+0x62) [0x2b03d0913402] [compute-g20-8:09470] [ 4] /cell_root/software/openmpi/1.6/gnu/sys/lib/openmpi/mca_btl_openib.so [0x2b03da142ff3] [compute-g20-8:09470] [ 5] /lib64/libpthread.so.0 [0x2b03d062f83d] [compute-g20-8:09470] [ 6] /lib64/libc.so.6(clone+0x6d) [0x2b03d091a26d] [compute-g20-8:09470] *** End of error message *** Perl exited with active threads: 0 running and unjoined 1 finished and unjoined 0 running and detached FATAL: Thread terminated, causing all processes to fail --> rank=11, hostname=compute-g20-8.deepthought.umd.edu setting up GFF3 output and fasta chunks FATAL: Thread terminated, causing all processes to fail --> rank=10, hostname=compute-g20-8.deepthought.umd.edu setting up GFF3 output and fasta chunks FATAL: Thread terminated, causing all processes to fail --> rank=13, hostname=compute-g20-8.deepthought.umd.edu FATAL: Thread terminated, causing all processes to fail --> rank=15, hostname=compute-g20-8.deepthought.umd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Feb 27 11:09:21 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 27 Feb 2014 18:09:21 +0000 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib Example : mpiexec -mca btl ^openib -n 20 maker Thanks, Carson From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM To: > Subject: Problem with OpenFabrics and infiniband Hello, I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. IT suggestions If you look at the top of the error log for the problematic job, it clearly warns of an issue with doing 'fork's within openmpi/openfabrics framework. In particular, the use of the fork system call is only partially supported in the OpenFabrics software (this is the drivers, etc for the infiniband connections). See e.g. http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork for more information. In particular the paragraphs starting with the sentence with the red highlighted "it does not mean that your fork()-calling application is safe". (The kernel, openMPI version, and OFED version are sufficiently recent to mean that there is _some_ fork support). The fact that the job runs over gigE but not IB, in conjunction with the warning from openmpi, strongly suggests that this is the issue that you are encountering. I suspect that maker touches registered memory before the fork, which would result in a segfault (matching what was observed). You can try adding the arguments --mca mpi_warn_on_fork 0 to the mpirun command, just in case the crash was somehow caused by openmpi's warning, but I would not hold out much hope for that. ###UPDATE### This does not fix the problem. Basically, it looks like maker uses some system calls like fork in a manner which is incompatible with the current OpenFabrics software, and thus will not work with infiniband. This situation is likely to remain until either maker changes to be compatible with OFED, or OFED's support for the fork system call is broadened. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bioinformatics.umd at gmail.com Thu Feb 27 11:55:34 2014 From: bioinformatics.umd at gmail.com (UMD Bioinformatics) Date: Thu, 27 Feb 2014 13:55:34 -0500 Subject: [maker-devel] Problem with OpenFabrics and infiniband In-Reply-To: References: <0D6CCF05-A126-445F-9F13-1E111CCDAA8A@gmail.com> Message-ID: <2840BC1C-70CC-4A0D-AB44-AEFD718C7B8C@gmail.com> Hi Carson, Thanks that fixed the issue. Cheers Ian On Feb 27, 2014, at 1:09 PM, Carson Holt wrote: > It?s a little more complicated than that. MAKER is written in Perl, and Perl doesn?t give me the low level access that a language like C would for controlling memory access (I don?t control that). All I get is Perl?s standard implementation of forks. So it?s not really a matter of MAKER changing, it would be a matter of changing Perl itself (which I have no power over, and I don?t think will be changing anytime soon). > > For now you just have to add this flag to OpenMPI when running MAKER with mpiexec ?> -mca btl ^openib > > Example : >> mpiexec -mca btl ^openib -n 20 maker > > > Thanks, > Carson > > > From: UMD Bioinformatics > Date: Thursday, February 27, 2014 at 9:46 AM > To: > Subject: Problem with OpenFabrics and infiniband > > Hello, > > I?ve had my IT folks install maker on our cluster at UMD. I?m having a SEGFAULT error when running maker on inifiniband nodes vs gigE nodes. According to the logs this appears to be an issue with forks but I?m not sure how to fix this. I would simply use the gigE nodes but we are in the process of updating everything to inifiniband so I?ll need to address this issue as some point. I?ve attached the error log from the MPI run as well as commentary from my HPCC team. > > IT suggestions > > If you look at the top of the error log for the problematic job, it clearly > warns of an issue with doing 'fork's within openmpi/openfabrics framework. > > In particular, the use of the fork system call is only partially supported > in the OpenFabrics software (this is the drivers, etc for the infiniband > connections). See e.g. > http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork > for more information. In particular the paragraphs starting with the > sentence with the red highlighted "it does not mean that your fork()-calling > application is safe". (The kernel, openMPI version, and OFED version are > sufficiently recent to mean that there is _some_ fork support). > > The fact that the job runs over gigE but not IB, in conjunction with the > warning from openmpi, strongly suggests that this is the issue that you are > encountering. I suspect that maker touches registered memory before the fork, > which would result in a segfault (matching what was observed). > > You can try adding the arguments > --mca mpi_warn_on_fork 0 > to the mpirun command, just in case the crash was somehow caused by openmpi's > warning, but I would not hold out much hope for that. > > ###UPDATE### This does not fix the problem. > > > Basically, it looks like maker uses some system calls like fork in a manner > which is incompatible with the current OpenFabrics software, and thus will > not work with infiniband. This situation is likely to remain until either > maker changes to be compatible with OFED, or OFED's support for the fork > system call is broadened. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 16:17:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 15:17:22 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Is there a corresponding?protein_forward=1 option to map forward protein names from protein2genome? Cheers, Shaun On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. --Carson? Sent from my iPhone On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. ?Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. ?Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). ?I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors.? Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 3:04 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. THanks, Mikael 26 feb 2014 kl. 17:09 skrev Carson Holt : It will still work without est_forward. ?It just works a little differently. ?Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. ?Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). ?So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). ?To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). ?The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match.? Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). ?MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. ?Also match parameters for exonerate will not be relaxed as they were with est_forward. As you can see the behavior, is slightly different (because it?s an accidental feature). Thanks, Carson From: Mikael Brandstr?m Durling Date: Wednesday, February 26, 2014 at 6:37 AM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? Mikael 26 feb 2014 kl. 14:22 skrev Carson Holt : Yes. ?That should work as well as an accidental feature. --Carson? Sent from my iPhone On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? Thanks, Mikael 26 feb 2014 kl. 01:58 skrev Carson Holt : There is a way. ?It?s not a standard option and it?s undocumented, but if you add?est_forward=1 to the maker_opts.ctl file, then it will do just that. ?The option won?t already be there so you?ll have to type it in. There is also a feature designed to work with this option. ?If you add tags to your fasta headers, those can be used to guide the mapping and naming. ?For example, gene_id= ?will ensure different isoforms that share a common gene_id get clustered into the same gene, and?maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp ?and just using maker_coor=chr1 will force it to only be mapped against chr1. This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. ?Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Tuesday, February 25, 2014 at 5:06 PM To: Subject: [maker-devel] Mapping gene names Hi, I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? maker_opts.ctl est=NC_123456.frn protein=NC_123456.faa est2genome=1 protein2genome=1 Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu Feb 27 17:27:30 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 27 Feb 2014 16:27:30 -0800 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? organism_type=prokaryotic est2genome=1 protein2genome=1 est_forward=1 Cheers, Shaun On 27 February 2014 15:17, Shaun Jackman wrote: > Is there a corresponding protein_forward=1 option to map forward protein > names from protein2genome? > > Cheers, > Shaun > > On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) > wrote: > > Sorry I meant to say prefilter on the score in the mRNA column before > passing the gff3 to model_gff. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: > > What you can do is run it once with just est_forward=1 and > est2genome/protein2genome set to 1. Then take those results, pass them in > as model_gff and use the map_forward option to then filter the results > based on mRNA score and that would copy names onto new gene under the > standard MAKER pipeline. Eventually it?s really supposed to go into a > separate tool that will map genes onto new assemblies (but under the hood > the tool will just be calling MAKER with certain parameters restricted). I > do this because if people commonly use it mixed with things like SNAP I can > start to get some very weird behaviors. > > Thanks, > Carson > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 3:04 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > It seems that this could be a very useful option in those cases where > you have firm a priori knowledge of the placement of ESTs. However, while > trying it I note that est_forward implies that the est2genome predictor is > turned on, implicitly. Is this necessary for this to work? I?m after the > behavior you describe below where exonerate is made to try really hard > within a limited region to align an est, but I would not like maker to > produce est2genome predictions. > > In general, I think this maker_coor and est_forward is a feature set that > is worthy to be promoted into a documented feature. > > THanks, > Mikael > > 26 feb 2014 kl. 17:09 skrev Carson Holt : > > It will still work without est_forward. It just works a little > differently. Keep in mind this was a hidden feature I used to find > stubborn or hard to find missing genes after reassembly of a genome. > > If est_forward is provided, MAKER will parse the database to look for the > maker_coor tags early in the pipeline. Then it will create a list of > locations to search, and it will search them even if there are no BLAST > results to seed the search (normally MAKER gets a BLAST result first and > then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to > look for a match using all of chr1 as the input to exonerate even when > BLAST finds nothing (this is a very very slow search, but can help pick up > one or two stubborn genes that don?t remap well). To allow this, MAKER > gives exonerate looser matching parameters (i.e. allows for single base > pair introns perhaps caused by assembly errors). The logic here is that > given the fact that I already told MAKER that with some degree of > confidence I expect sequence A to map to to location X, it will try its > hardest to make it match. > > Without est_forward set, the maker_coor= flag still gets read in GI.pm at > line 1563, but only after a BLAST alignment has already seeded it to the > region (that BLAST result has the information in its description > parameter). MAKER will then ignore seeds completely outside of maker_coor. > In addition any BLAST seeds that overlap maker_coor will get the search > space for alignment polishing adjusted to match maker_coor exactly. Also > match parameters for exonerate will not be relaxed as they were with > est_forward. > > As you can see the behavior, is slightly different (because it?s an > accidental feature). > > Thanks, > Carson > > > > From: Mikael Brandstr?m Durling > Date: Wednesday, February 26, 2014 at 6:37 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > That might be a useful and time saving accidental feature. But, reading > the code, it seems that I need to supply maker_coor but not gene_id, as > well as the configuration option est_forward for this to work. Any > occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 > right? > > Mikael > > 26 feb 2014 kl. 14:22 skrev Carson Holt : > > Yes. That should work as well as an accidental feature. > > --Carson > > Sent from my iPhone > > On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < > mikael.durling at slu.se> wrote: > > Can this use of maker_coor be used only to hint about the placement of the > ests, without affecting the naming of the final genes? Ie if I have a > database of EST where I have a priori knowledge of their rough placement, > can this placement be given to maker without providing est_forward=1? > > Thanks, > Mikael > > 26 feb 2014 kl. 01:58 skrev Carson Holt : > > There is a way. It?s not a standard option and it?s undocumented, but > if you add est_forward=1 to the maker_opts.ctl file, then it will do just > that. The option won?t already be there so you?ll have to type it in. > > There is also a feature designed to work with this option. If you add > tags to your fasta headers, those can be used to guide the mapping and > naming. For example, gene_id= will ensure different isoforms > that share a common gene_id get clustered into the same gene, > and maker_coor=chr1:1-10000 in the fasta header will force a particular > sequence to only be mapped against chr1 within the range of 1-10000 bp and > just using maker_coor=chr1 will force it to only be mapped against chr1. > > This is an undocumented way to remap genes onto new assemblies using blast > alignments of earlier transcript or protein annotations as a guide. > > ?Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, February 25, 2014 at 5:06 PM > To: > Subject: [maker-devel] Mapping gene names > > Hi, > > I?m annotating a genome using a closely related genome from Genbank, using > the .frn (RNA) and .faa (protein) files from Genbank as evidence to > annotate my genome. I?ve run Maker, and the annotation seems to have worked > well. Is it possible to map the names of the genes from the related species > to my annotation? I see the *map_forward* option, which applies to the > *model_gff* parameter. Is there a similar option for *est* and *protein*? > > *maker_opts.ctl* > > est=NC_123456.frn > protein=NC_123456.faa > est2genome=1 > protein2genome=1 > > Thanks, > Shaun > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Feb 27 18:13:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 27 Feb 2014 18:13:06 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. --Carson Sent from my iPhone > On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: > > Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! > > The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? > > organism_type=prokaryotic > est2genome=1 > protein2genome=1 > est_forward=1 > Cheers, > Shaun > > > >> On 27 February 2014 15:17, Shaun Jackman wrote: >> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >> >> Cheers, >> Shaun >> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>> >>>>>> Yes. That should work as well as an accidental feature. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>> >>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>> >>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>> >>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>> >>>>>>>> ?Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Shaun Jackman >>>>>>>> Reply-To: Shaun Jackman >>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>> To: >>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>> >>>>>>>> maker_opts.ctl >>>>>>>> >>>>>>>> est=NC_123456.frn >>>>>>>> protein=NC_123456.faa >>>>>>>> est2genome=1 >>>>>>>> protein2genome=1 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Shaun >>>>>>>> >>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.durling at slu.se Fri Feb 28 03:40:30 2014 From: mikael.durling at slu.se (=?Windows-1252?Q?Mikael_Brandstr=F6m_Durling?=) Date: Fri, 28 Feb 2014 10:40:30 +0000 Subject: [maker-devel] maker_coor behaviour Message-ID: <8CA99854-CF5B-4533-B625-0EDD5DFFCE8B@slu.se> Hi, in a previous thread, the maker_coor feature for ETSs was mentioned. I have been trying it out, without using it for mapping gene names. I have placed these ESTs by other means, an thought the maker_coor feature would be a good use of this a priori knowledge. My major problem i try to solve is that I find that some ESTs where I know where they should be aligned, are not recruited to that position by maker?s blastn->exonerate method (I find them on other scaffolds). So I thought maker_coor with the est_forward behavior (as described) would be a good option to force my evidence onto the correct position, instead of ending up supporting or braking other models. However, as soon as I run with maker_coor tagged est sequences, no est2genome evidence appears in the final gff3 file. The blastn evidence is there when est_forward is disabled, but as expected, there is no blastn evidence when est_forward is turned on. It seems though as the evidence is used, as the QI lines indicate EST support for both splice sites as well as exon alignments, but I have no way to visualize and/or evaluate the congruence of evidence and models. Would it be possible to tweak Maker into outputting the est2genome alignments when est_forward/maker_coor is used? I couldn?t figure myself where in the code this was handled. I could of course do my own exonerate alignments of these ESTs and feed them into maker as est_gff, but if maker already has the machinery to to this, I thought it would be a good idea to use it. Thanks, Mikael From carsonhh at gmail.com Fri Feb 28 07:09:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 07:09:09 -0700 Subject: [maker-devel] maker_coor behaviour Message-ID: I wouldn?t use those options for standard de novo annotation. There are really other more appropriate thing that should be used instead. Both maker_coor and est_forward are destined to be part of a separate tool that will secretly just be calling MAKER, but will allow me to control what other parameters MAKER sees to avoid certain logic incompatibilities that make sense when mapping entire genes onto a new assembly, but not really for de novo annotation using ESTs. You should instead try modifying these options in the maker_bopts.ctl file ?> pcov_blastn= #Blastn Percent Coverage Threhold EST-Genome Alignments pid_blastn= #Blastn Percent Identity Threshold EST-Genome Aligments eval_blastn= #Blastn eval cutoff bit_blastn= #Blastn bit cutoff depth_blastn= #Blastn depth cutoff (0 to disable cutoff). For trimming high evidence overlap regions en_score_limit= #Exonerate nucleotide percent of maximal score threshold If either blastn or est2genome results disappear, it is because they don?t meet one of these thresholds (blastn results that don?t meet the thresholds but are borderline are kept if exonerate does meet the thresholds, but if exonerate misses a threshold they will be thrown out). That is whey the EST in question gets thrown out and it?s why the blastn result disappears when you try and anchor it with maker_coor. You can visualize everything with a browser when your done. I still recommend the old version of Apollo for this (it?s just easier). You can try and install it using the ?./Build apollo? option from the .../maker/src/ directory, and it will be installed in .../maker/exe/apollo. It requires that you have apache ant installed to do this. Otherwise just download it from the GMOD source forge page and install it manually. Thanks, Carson On 2/28/14, 3:40 AM, "Mikael Brandstr?m Durling" wrote: >Hi, > >in a previous thread, the maker_coor feature for ETSs was mentioned. I >have been trying it out, without using it for mapping gene names. I have >placed these ESTs by other means, an thought the maker_coor feature would >be a good use of this a priori knowledge. My major problem i try to solve >is that I find that some ESTs where I know where they should be aligned, >are not recruited to that position by maker?s blastn->exonerate method (I >find them on other scaffolds). So I thought maker_coor with the >est_forward behavior (as described) would be a good option to force my >evidence onto the correct position, instead of ending up supporting or >braking other models. However, as soon as I run with maker_coor tagged >est sequences, no est2genome evidence appears in the final gff3 file. The >blastn evidence is there when est_forward is disabled, but as expected, >there is no blastn evidence when est_forward is turned on. It seems >though as the evidence is used, as the QI lines indicate EST support for >both splice sites as well as exon alignments, but I have no way to >visualize and/or evaluate the congruence of evidence and models. Would it >be possible to tweak Maker into outputting the est2genome alignments when >est_forward/maker_coor is used? I couldn?t figure myself where in the >code this was handled. > >I could of course do my own exonerate alignments of these ESTs and feed >them into maker as est_gff, but if maker already has the machinery to to >this, I thought it would be a good idea to use it. > >Thanks, >Mikael > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From rbharris at uw.edu Fri Feb 28 13:14:55 2014 From: rbharris at uw.edu (Rebecca Harris) Date: Fri, 28 Feb 2014 12:14:55 -0800 Subject: [maker-devel] error in snap training In-Reply-To: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > > > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: > > > > Hey - > > > > I'm trying to train SNAP and am running into errors. I don't have any > EST evidence, just protein. My .gff file reports 10865 genes but when I run > maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, > a ton of overlap_prev_exon errors get written to the screen and then with I > get to the forge step I get an "impossible error5". Any help would be > greatly appreciated. > > > > Thanks! > > Rebecca > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Feb 28 13:22:12 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 28 Feb 2014 13:22:12 -0700 Subject: [maker-devel] error in snap training In-Reply-To: References: <16FFC38F-7652-4A34-8AF0-B3631BF8F6D9@gmail.com> Message-ID: If it?s failing both ways I?m thinking this may be SNAP itself. Try these two different versions of SNAP. ?> http://korflab.ucdavis.edu/Software/snap-2013-02-16.tar.gz and ?> http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz If they both fail then contact the SNAP development group ?> korflab AT ucdavis DOT edu Thanks, Carson From: Rebecca Harris Date: Friday, February 28, 2014 at 1:14 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] error in snap training Hi - I tried this and ran cegma --genome on my original fasta file. I then tried to use cegama2zff to convert, fathom, and forge. However, when I try to generate new parameters with forge, I get the same error that I got when trying to train SNAP without CEGMA: "ZOE ERROR (from forge): impossible error5 KOG1342.20". Any suggestions would be great, thanks! Cheers, Rebecca On Tue, Feb 25, 2014 at 2:12 PM, Carson Holt wrote: > Make sure you are using 2.31, and then try the maker2zff filters > individually. If the protein models are not working well, use CEGMA to > generate models. It's from the same group as SNAP. Use cegma2zff for the > conversion. > > --Carson > > Sent from my iPhone > >> > On Feb 25, 2014, at 2:49 PM, Rebecca Harris wrote: >> > >> > Hey - >> > >> > I'm trying to train SNAP and am running into errors. I don't have any EST >> evidence, just protein. My .gff file reports 10865 genes but when I run >> maker2zff -c0 -e0 I get back empty genome files. When I run maker2zff -n, a >> ton of overlap_prev_exon errors get written to the screen and then with I get >> to the forge step I get an "impossible error5". Any help would be greatly >> appreciated. >> > >> > Thanks! >> > Rebecca >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: