From jxf1023 at gmail.com Thu Aug 8 20:11:37 2013 From: jxf1023 at gmail.com (Xiaofang Jiang) Date: Thu, 08 Aug 2013 21:11:37 -0400 Subject: [maker-devel] annotation comparison Message-ID: <520441C9.30407@gmail.com> Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang From carsonhh at gmail.com Fri Aug 23 11:38:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:38:51 -0400 Subject: [maker-devel] unknown error In-Reply-To: <602FA505-8A78-4006-8D8F-D3EC81A73FEC@uga.edu> Message-ID: Sorry for the slow response, but this message was caught by my spam filter so I never saw it till now. Which version of MAKER are you using? Have you tied the most recent online version/ --Carson On 7/30/13 3:13 PM, "Gaelen Burke" wrote: >Hello, >I have an error that occurs in my MAKER run for 52 of several thousand >scaffolds. The annotation of these scaffolds fails, even after 3 >re-tries. I also pulled out the failed sequences and started a run from >scratch, with the same result. Could anyone tell me what this error means >and how I could possibly fix it? >Thanks, >Gaelen Burke > >Here is the message that occurs: > >Processing transcripts into genes > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Calling translate without a seq argument! >STACK: Error::throw >STACK: Bio::Root::Root::throw >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Root/Root.pm:472 >STACK: Bio::Tools::CodonTable::translate >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Tools/CodonTable.pm >:411 >STACK: PhatHit_utils::_adjust >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit_utils >.pm: >880 >STACK: PhatHit_utils::adjust_start_stop >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit >_utils.pm:776 >STACK: maker::auto_annotator::load_transcript_struct >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/. >./lib/maker/auto_annotator.pm:1808 >STACK: maker::auto_annotator::group_transcripts >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib >/maker/auto_annotator.pm:2163 >STACK: maker::auto_annotator::annotate_genes >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/ma >ker/auto_annotator.pm:877 >STACK: Process::MpiChunk::_go >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:2159 >STACK: Process::MpiChunk::run >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:257 >STACK: Process::MpiTiers::run_all >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiTi >ers.pm:193 >STACK: /usr/local/maker/latest/bin/maker:276 >----------------------------------------------------------- > >FATAL ERROR >ERROR: Failed while clustering transcripts into genes for annotations!! > >ERROR: Chunk failed at level 20 >!! >FAILED CONTIG:scaffold_0040 > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Michael.Li3 at AGR.GC.CA Fri Aug 23 11:46:17 2013 From: Michael.Li3 at AGR.GC.CA (Li, Michael) Date: Fri, 23 Aug 2013 16:46:17 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: <229AF11430CC544B8987653593A750A92FB4AAFE@ONOTTAXES3.AGR.GC.CA> I'd be interested in knowing about this too. I'm writing an internship work report on this topic, so it would be great to hear about it. I personally use ParsEval to compare two sets of annotation. It seems pretty thorough, but it's still in its early stages of development. You can read a bit more about it here: https://github.com/standage/AEGeAn Cheers, Michael -----Original Message----- From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Xiaofang Jiang Sent: Thursday, August 08, 2013 9:12 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Aug 23 11:46:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:46:35 -0400 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> Message-ID: Sorry for the slow response. This was caught by my spam filter for some reason, so I am only seeing it now. Eval from WashU is the probably best tool for comparing two annotation sets. MAKER comes with a script that will convert MAKER's GFF3 output into Eval GTF input. If you are getting fewer genes than expected then you probably need to add more evidence. MAKER will be default reject genes that are not supported by either a protein or an EST. The keep_preds=1 option makes it keep everything even if there is no support. Usually people make the mistake of not providing sufficient protein evidence. For example, just using something like UniProt may not be sufficient, you may want to pick a couple of other species with annotated genomes (fruit fly for example or another mosquito species) and provide every protein from their genomes as evidence. Also if this is a newly sequenced genome, run CEGMA to see how complete the assembly is. If the assembly is 80% complete for example, you would only expect to retrieve 80% of expected genes. --Carson On 8/8/13 9:11 PM, "Xiaofang Jiang" wrote: >Dear Maker Developers, > >I am annotating a mosquitoes genome using Maker. I have two questions >regarding Maker annotation. > >1. Are there any scripts available to compare two sets of annotations? >We know about SOBA but were wondering if there is something more >comprehensive that you guys use. > >2. I am expecting around 13,000 genes, however maker only predicted >9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as >evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 >and that resulted 17,000 genes, it seems that shouldn't have happened. >So in order to get more genes, should I try change "single_length" to >100, and change "pred_flanks" to 100? > > >Best, > >Xiaofang > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From neil at cs.uky.edu Fri Aug 23 12:19:53 2013 From: neil at cs.uky.edu (Neil Moore) Date: Fri, 23 Aug 2013 13:19:53 -0400 Subject: [maker-devel] [PATCH] Corrupted exon table file when running FGeneSH Message-ID: <21015.39353.259641.458113@dirac.s-z.org> I encountered problems with FGeneSH failing on some contigs (small ones with few if any genes). Investigating the logs, I found that fgenesh was complaining about a "Corrupted exon table file"; it turns out that MAKER had omitted the header line from the exon table. I think this happened when there were predictions or evidence for that contig, but none of them contained introns; I haven't been able to verify that, though. The following patch corrected the problem and allowed FGeneSH to run, but I don't know the code well so perhaps there is a better way. --- lib/Widget/fgenesh.pm 2013-05-22 12:34:00.000000000 -0400 +++ /home/neil/fgenesh.pm 2013-08-08 14:41:40.585139335 -0400 @@ -562,18 +562,18 @@ push(@xdef, $l); } - return \@xdef if(!@$i_coors); - - foreach my $i (@$i_coors){ - my $i_b = ($i->[0] - $offset) + ($i_flank-1); - my $i_e = ($i->[1] - $offset) - ($i_flank-1); - - next if abs($i_b - $i_e) < 2*$i_flank; - next if abs($i_b - $i_e) < 25; - - my $l = "$i_b $i_e -1000"; - - push(@xdef, $l); + if(@$i_coors) { + foreach my $i (@$i_coors){ + my $i_b = ($i->[0] - $offset) + ($i_flank-1); + my $i_e = ($i->[1] - $offset) - ($i_flank-1); + + next if abs($i_b - $i_e) < 2*$i_flank; + next if abs($i_b - $i_e) < 25; + + my $l = "$i_b $i_e -1000"; + + push(@xdef, $l); + } } my $num = @xdef; -- Dr Neil Moore, neil at cs.uky.edu, neil at uky.edu, neil at s-z.org From dence at genetics.utah.edu Fri Aug 23 12:41:41 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 Aug 2013 17:41:41 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: Hi Xiaofang,? 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence.? 2. It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models.? To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito.? Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes.? What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species.? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] Sent: Thursday, August 08, 2013 7:11 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Fri Aug 23 16:05:18 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 23 Aug 2013 15:05:18 -0600 Subject: [maker-devel] annotation comparison In-Reply-To: References: <520441C9.30407@gmail.com> Message-ID: <6CAEFC94-1C2C-4CB1-886E-32C4A4D5EA36@genetics.utah.edu> Also Xiaofang, you mentioned that you have used SOBA, there is a command line version (called SOBAcl) that hasn't been advertised too well that does a lot more than the web version in terms of comparison and reporting. You can find it here: http://www.sequenceontology.org/resources/sobacl.html It can prepare tables and graphs on multiple GFF3 for feature counts, lengths, footprints etc. Mike Campbell here in the lab has a script to generate the data necessary for AED figures like this one and he's going to add it to maker/bin some time in the near future. B On Aug 23, 2013, at 11:41 AM, Daniel Ence wrote: > Hi Xiaofang, > > 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence. > > 2. > It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models. > > To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito. > > Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes. > > What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species. > > Thanks, > Daniel > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] > Sent: Thursday, August 08, 2013 7:11 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] annotation comparison > > Dear Maker Developers, > > I am annotating a mosquitoes genome using Maker. I have two questions > regarding Maker annotation. > > 1. Are there any scripts available to compare two sets of annotations? > We know about SOBA but were wondering if there is something more > comprehensive that you guys use. > > 2. I am expecting around 13,000 genes, however maker only predicted > 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as > evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 > and that resulted 17,000 genes, it seems that shouldn't have happened. > So in order to get more genes, should I try change "single_length" to > 100, and change "pred_flanks" to 100? > > > Best, > > Xiaofang > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Mon Aug 26 13:20:31 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Mon, 26 Aug 2013 13:20:31 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion Message-ID: Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a *de novo* assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Mon Aug 26 13:54:20 2013 From: jfierst at uoregon.edu (Janna Fierst) Date: Mon, 26 Aug 2013 11:54:20 -0700 Subject: [maker-devel] exon/intron boundaries Message-ID: Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 14:00:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:00:55 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: The correct_est_fusion option just clips UTR on overlapping genes. I suspect the real problem is setting pred_flank too low. If your lead in sequence to a gene is too short, ab initio predictors won't call it. So you are probably getting empty reports from SNAP/Augustus for the hint based predictions. Try increasing pred_flank to at least 150. Setting pred_flank too low will also limit how far MAKER will walk out along the edges initial alignments during the polishing step (exonerate). So setting it too low may also be causing you to lose some EST and protein alignments. --Carson From: Benjamin Rubin Date: Monday, August 26, 2013 2:20 PM To: Subject: [maker-devel] Unexpected results with correct_est_fusion Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a de novo assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 14:21:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:21:27 -0400 Subject: [maker-devel] exon/intron boundaries In-Reply-To: Message-ID: Are you getting gene fusions or just more exons? Gene fusions can be reduced by setting correct_est_fusion=1, or reducing pred_flank, although reducing pred_flank can cause other issues (but those generally only appear if setting the value below below 150). Also if you have the maximum intron size set to high (split_hit option), you may also be generating bridging alignments that make evidence align across distant paralogous genes as well (this can result in gene merging) You should also look at your results manually in a viewer like Apollo. Then see if the extra exons are supported by something such as protein alignments from another species. If this is the case, you may have a poorly annotated protein set that is being used as evidence that is carrying over it's erroneous exons into the species you are annotating. If the extra exons are supported by EST evidence, then perhaps you should try and rebuild the EST assembly (for example trinity has an option to use a Jarccardian similarity coefficient to avoid fusing transcripts). Another option, is to retrain SNAP or Augustus. MAKER does not actually produce any of the models itself (it is a pipeline not a predictor). The models are all generated using these other algorithms, MAKER just feeds them hints based on protein and transcript alignments, so making sure training is sufficient is important for those programs to produce their best models. Finally make sure your repeat database is sufficient, you may need to generate a species specific repeat library using something like RepeatModeler. Repeats can end up being included as extra exons in gene models because they may contain reading frames the do code for proteins (I.e. reverse transcriptases). If you have any questions on any of the above, just let us know. Thanks, Carson From: Janna Fierst Date: Monday, August 26, 2013 2:54 PM To: Subject: [maker-devel] exon/intron boundaries Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Tue Aug 27 09:59:50 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Tue, 27 Aug 2013 09:59:50 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: References: Message-ID: Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I > suspect the real problem is setting pred_flank too low. If your lead in > sequence to a gene is too short, ab initio predictors won't call it. So > you are probably getting empty reports from SNAP/Augustus for the hint > based predictions. Try increasing pred_flank to at least 150. Setting > pred_flank too low will also limit how far MAKER will walk out along the > edges initial alignments during the polishing step (exonerate). So setting > it too low may also be causing you to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein > sequence evidence from all seven of the other sequenced ant genomes and a > *de novo* assembled transcriptome as EST evidence. I assembled the > transcriptome using Trinity with the jaccard_clip option turned on to > reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I > still have substantial fusion problems with the final annotation. > Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. > However, correct_est_fusion leads to the prediction of a much smaller > number of genes (~5,000 instead of ~14,000). I am initially training both > SNAP and Augustus using CEGMA genes and then retraining based on the first > round of annotation. Both rounds of annotation yield the same low number > (~5,000) of genes. It may also be worth mentioning that the number of exons > is also far lower when using correct_est_fusion (~26,000 instead of > ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that > it reduced the predicted number of genes by such a large margin. I am > concerned that I am using it incorrectly. Do you have any other suggestions > for reducing gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4810 bytes Desc: not available URL: From carsonhh at gmail.com Wed Aug 28 08:09:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 Aug 2013 09:09:06 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: Could you pick one contig where the number of genes shift dramatically and upload that contig fasta together with your control files and any evidence datasets used to one of our servers (I'm going to send you connection details in a separate e-mail). I can then run with and without correct_est_fusion to see if there is anything unexpected going on. --Carson From: Benjamin Rubin Date: Tuesday, August 27, 2013 10:59 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I suspect > the real problem is setting pred_flank too low. If your lead in sequence to a > gene is too short, ab initio predictors won't call it. So you are probably > getting empty reports from SNAP/Augustus for the hint based predictions. Try > increasing pred_flank to at least 150. Setting pred_flank too low will also > limit how far MAKER will walk out along the edges initial alignments during > the polishing step (exonerate). So setting it too low may also be causing you > to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence > evidence from all seven of the other sequenced ant genomes and a de novo > assembled transcriptome as EST evidence. I assembled the transcriptome using > Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite > using this set of hopefully non-fused ESTs, I still have substantial fusion > problems with the final annotation. Therefore, I reduced pred_flank to 100 and > turned on correct_est_fusion. However, correct_est_fusion leads to the > prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am > initially training both SNAP and Augustus using CEGMA genes and then > retraining based on the first round of annotation. Both rounds of annotation > yield the same low number (~5,000) of genes. It may also be worth mentioning > that the number of exons is also far lower when using correct_est_fusion > (~26,000 instead of ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that it > reduced the predicted number of genes by such a large margin. I am concerned > that I am using it incorrectly. Do you have any other suggestions for reducing > gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jxf1023 at gmail.com Thu Aug 8 19:11:37 2013 From: jxf1023 at gmail.com (Xiaofang Jiang) Date: Thu, 08 Aug 2013 21:11:37 -0400 Subject: [maker-devel] annotation comparison Message-ID: <520441C9.30407@gmail.com> Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang From carsonhh at gmail.com Fri Aug 23 10:38:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:38:51 -0400 Subject: [maker-devel] unknown error In-Reply-To: <602FA505-8A78-4006-8D8F-D3EC81A73FEC@uga.edu> Message-ID: Sorry for the slow response, but this message was caught by my spam filter so I never saw it till now. Which version of MAKER are you using? Have you tied the most recent online version/ --Carson On 7/30/13 3:13 PM, "Gaelen Burke" wrote: >Hello, >I have an error that occurs in my MAKER run for 52 of several thousand >scaffolds. The annotation of these scaffolds fails, even after 3 >re-tries. I also pulled out the failed sequences and started a run from >scratch, with the same result. Could anyone tell me what this error means >and how I could possibly fix it? >Thanks, >Gaelen Burke > >Here is the message that occurs: > >Processing transcripts into genes > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Calling translate without a seq argument! >STACK: Error::throw >STACK: Bio::Root::Root::throw >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Root/Root.pm:472 >STACK: Bio::Tools::CodonTable::translate >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Tools/CodonTable.pm >:411 >STACK: PhatHit_utils::_adjust >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit_utils >.pm: >880 >STACK: PhatHit_utils::adjust_start_stop >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit >_utils.pm:776 >STACK: maker::auto_annotator::load_transcript_struct >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/. >./lib/maker/auto_annotator.pm:1808 >STACK: maker::auto_annotator::group_transcripts >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib >/maker/auto_annotator.pm:2163 >STACK: maker::auto_annotator::annotate_genes >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/ma >ker/auto_annotator.pm:877 >STACK: Process::MpiChunk::_go >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:2159 >STACK: Process::MpiChunk::run >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:257 >STACK: Process::MpiTiers::run_all >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiTi >ers.pm:193 >STACK: /usr/local/maker/latest/bin/maker:276 >----------------------------------------------------------- > >FATAL ERROR >ERROR: Failed while clustering transcripts into genes for annotations!! > >ERROR: Chunk failed at level 20 >!! >FAILED CONTIG:scaffold_0040 > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Michael.Li3 at AGR.GC.CA Fri Aug 23 10:46:17 2013 From: Michael.Li3 at AGR.GC.CA (Li, Michael) Date: Fri, 23 Aug 2013 16:46:17 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: <229AF11430CC544B8987653593A750A92FB4AAFE@ONOTTAXES3.AGR.GC.CA> I'd be interested in knowing about this too. I'm writing an internship work report on this topic, so it would be great to hear about it. I personally use ParsEval to compare two sets of annotation. It seems pretty thorough, but it's still in its early stages of development. You can read a bit more about it here: https://github.com/standage/AEGeAn Cheers, Michael -----Original Message----- From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Xiaofang Jiang Sent: Thursday, August 08, 2013 9:12 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Aug 23 10:46:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:46:35 -0400 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> Message-ID: Sorry for the slow response. This was caught by my spam filter for some reason, so I am only seeing it now. Eval from WashU is the probably best tool for comparing two annotation sets. MAKER comes with a script that will convert MAKER's GFF3 output into Eval GTF input. If you are getting fewer genes than expected then you probably need to add more evidence. MAKER will be default reject genes that are not supported by either a protein or an EST. The keep_preds=1 option makes it keep everything even if there is no support. Usually people make the mistake of not providing sufficient protein evidence. For example, just using something like UniProt may not be sufficient, you may want to pick a couple of other species with annotated genomes (fruit fly for example or another mosquito species) and provide every protein from their genomes as evidence. Also if this is a newly sequenced genome, run CEGMA to see how complete the assembly is. If the assembly is 80% complete for example, you would only expect to retrieve 80% of expected genes. --Carson On 8/8/13 9:11 PM, "Xiaofang Jiang" wrote: >Dear Maker Developers, > >I am annotating a mosquitoes genome using Maker. I have two questions >regarding Maker annotation. > >1. Are there any scripts available to compare two sets of annotations? >We know about SOBA but were wondering if there is something more >comprehensive that you guys use. > >2. I am expecting around 13,000 genes, however maker only predicted >9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as >evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 >and that resulted 17,000 genes, it seems that shouldn't have happened. >So in order to get more genes, should I try change "single_length" to >100, and change "pred_flanks" to 100? > > >Best, > >Xiaofang > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From neil at cs.uky.edu Fri Aug 23 11:19:53 2013 From: neil at cs.uky.edu (Neil Moore) Date: Fri, 23 Aug 2013 13:19:53 -0400 Subject: [maker-devel] [PATCH] Corrupted exon table file when running FGeneSH Message-ID: <21015.39353.259641.458113@dirac.s-z.org> I encountered problems with FGeneSH failing on some contigs (small ones with few if any genes). Investigating the logs, I found that fgenesh was complaining about a "Corrupted exon table file"; it turns out that MAKER had omitted the header line from the exon table. I think this happened when there were predictions or evidence for that contig, but none of them contained introns; I haven't been able to verify that, though. The following patch corrected the problem and allowed FGeneSH to run, but I don't know the code well so perhaps there is a better way. --- lib/Widget/fgenesh.pm 2013-05-22 12:34:00.000000000 -0400 +++ /home/neil/fgenesh.pm 2013-08-08 14:41:40.585139335 -0400 @@ -562,18 +562,18 @@ push(@xdef, $l); } - return \@xdef if(!@$i_coors); - - foreach my $i (@$i_coors){ - my $i_b = ($i->[0] - $offset) + ($i_flank-1); - my $i_e = ($i->[1] - $offset) - ($i_flank-1); - - next if abs($i_b - $i_e) < 2*$i_flank; - next if abs($i_b - $i_e) < 25; - - my $l = "$i_b $i_e -1000"; - - push(@xdef, $l); + if(@$i_coors) { + foreach my $i (@$i_coors){ + my $i_b = ($i->[0] - $offset) + ($i_flank-1); + my $i_e = ($i->[1] - $offset) - ($i_flank-1); + + next if abs($i_b - $i_e) < 2*$i_flank; + next if abs($i_b - $i_e) < 25; + + my $l = "$i_b $i_e -1000"; + + push(@xdef, $l); + } } my $num = @xdef; -- Dr Neil Moore, neil at cs.uky.edu, neil at uky.edu, neil at s-z.org From dence at genetics.utah.edu Fri Aug 23 11:41:41 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 Aug 2013 17:41:41 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: Hi Xiaofang,? 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence.? 2. It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models.? To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito.? Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes.? What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species.? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] Sent: Thursday, August 08, 2013 7:11 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Fri Aug 23 15:05:18 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 23 Aug 2013 15:05:18 -0600 Subject: [maker-devel] annotation comparison In-Reply-To: References: <520441C9.30407@gmail.com> Message-ID: <6CAEFC94-1C2C-4CB1-886E-32C4A4D5EA36@genetics.utah.edu> Also Xiaofang, you mentioned that you have used SOBA, there is a command line version (called SOBAcl) that hasn't been advertised too well that does a lot more than the web version in terms of comparison and reporting. You can find it here: http://www.sequenceontology.org/resources/sobacl.html It can prepare tables and graphs on multiple GFF3 for feature counts, lengths, footprints etc. Mike Campbell here in the lab has a script to generate the data necessary for AED figures like this one and he's going to add it to maker/bin some time in the near future. B On Aug 23, 2013, at 11:41 AM, Daniel Ence wrote: > Hi Xiaofang, > > 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence. > > 2. > It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models. > > To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito. > > Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes. > > What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species. > > Thanks, > Daniel > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] > Sent: Thursday, August 08, 2013 7:11 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] annotation comparison > > Dear Maker Developers, > > I am annotating a mosquitoes genome using Maker. I have two questions > regarding Maker annotation. > > 1. Are there any scripts available to compare two sets of annotations? > We know about SOBA but were wondering if there is something more > comprehensive that you guys use. > > 2. I am expecting around 13,000 genes, however maker only predicted > 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as > evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 > and that resulted 17,000 genes, it seems that shouldn't have happened. > So in order to get more genes, should I try change "single_length" to > 100, and change "pred_flanks" to 100? > > > Best, > > Xiaofang > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Mon Aug 26 12:20:31 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Mon, 26 Aug 2013 13:20:31 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion Message-ID: Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a *de novo* assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Mon Aug 26 12:54:20 2013 From: jfierst at uoregon.edu (Janna Fierst) Date: Mon, 26 Aug 2013 11:54:20 -0700 Subject: [maker-devel] exon/intron boundaries Message-ID: Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 13:00:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:00:55 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: The correct_est_fusion option just clips UTR on overlapping genes. I suspect the real problem is setting pred_flank too low. If your lead in sequence to a gene is too short, ab initio predictors won't call it. So you are probably getting empty reports from SNAP/Augustus for the hint based predictions. Try increasing pred_flank to at least 150. Setting pred_flank too low will also limit how far MAKER will walk out along the edges initial alignments during the polishing step (exonerate). So setting it too low may also be causing you to lose some EST and protein alignments. --Carson From: Benjamin Rubin Date: Monday, August 26, 2013 2:20 PM To: Subject: [maker-devel] Unexpected results with correct_est_fusion Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a de novo assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 13:21:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:21:27 -0400 Subject: [maker-devel] exon/intron boundaries In-Reply-To: Message-ID: Are you getting gene fusions or just more exons? Gene fusions can be reduced by setting correct_est_fusion=1, or reducing pred_flank, although reducing pred_flank can cause other issues (but those generally only appear if setting the value below below 150). Also if you have the maximum intron size set to high (split_hit option), you may also be generating bridging alignments that make evidence align across distant paralogous genes as well (this can result in gene merging) You should also look at your results manually in a viewer like Apollo. Then see if the extra exons are supported by something such as protein alignments from another species. If this is the case, you may have a poorly annotated protein set that is being used as evidence that is carrying over it's erroneous exons into the species you are annotating. If the extra exons are supported by EST evidence, then perhaps you should try and rebuild the EST assembly (for example trinity has an option to use a Jarccardian similarity coefficient to avoid fusing transcripts). Another option, is to retrain SNAP or Augustus. MAKER does not actually produce any of the models itself (it is a pipeline not a predictor). The models are all generated using these other algorithms, MAKER just feeds them hints based on protein and transcript alignments, so making sure training is sufficient is important for those programs to produce their best models. Finally make sure your repeat database is sufficient, you may need to generate a species specific repeat library using something like RepeatModeler. Repeats can end up being included as extra exons in gene models because they may contain reading frames the do code for proteins (I.e. reverse transcriptases). If you have any questions on any of the above, just let us know. Thanks, Carson From: Janna Fierst Date: Monday, August 26, 2013 2:54 PM To: Subject: [maker-devel] exon/intron boundaries Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Tue Aug 27 08:59:50 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Tue, 27 Aug 2013 09:59:50 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: References: Message-ID: Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I > suspect the real problem is setting pred_flank too low. If your lead in > sequence to a gene is too short, ab initio predictors won't call it. So > you are probably getting empty reports from SNAP/Augustus for the hint > based predictions. Try increasing pred_flank to at least 150. Setting > pred_flank too low will also limit how far MAKER will walk out along the > edges initial alignments during the polishing step (exonerate). So setting > it too low may also be causing you to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein > sequence evidence from all seven of the other sequenced ant genomes and a > *de novo* assembled transcriptome as EST evidence. I assembled the > transcriptome using Trinity with the jaccard_clip option turned on to > reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I > still have substantial fusion problems with the final annotation. > Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. > However, correct_est_fusion leads to the prediction of a much smaller > number of genes (~5,000 instead of ~14,000). I am initially training both > SNAP and Augustus using CEGMA genes and then retraining based on the first > round of annotation. Both rounds of annotation yield the same low number > (~5,000) of genes. It may also be worth mentioning that the number of exons > is also far lower when using correct_est_fusion (~26,000 instead of > ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that > it reduced the predicted number of genes by such a large margin. I am > concerned that I am using it incorrectly. Do you have any other suggestions > for reducing gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4810 bytes Desc: not available URL: From carsonhh at gmail.com Wed Aug 28 07:09:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 Aug 2013 09:09:06 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: Could you pick one contig where the number of genes shift dramatically and upload that contig fasta together with your control files and any evidence datasets used to one of our servers (I'm going to send you connection details in a separate e-mail). I can then run with and without correct_est_fusion to see if there is anything unexpected going on. --Carson From: Benjamin Rubin Date: Tuesday, August 27, 2013 10:59 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I suspect > the real problem is setting pred_flank too low. If your lead in sequence to a > gene is too short, ab initio predictors won't call it. So you are probably > getting empty reports from SNAP/Augustus for the hint based predictions. Try > increasing pred_flank to at least 150. Setting pred_flank too low will also > limit how far MAKER will walk out along the edges initial alignments during > the polishing step (exonerate). So setting it too low may also be causing you > to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence > evidence from all seven of the other sequenced ant genomes and a de novo > assembled transcriptome as EST evidence. I assembled the transcriptome using > Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite > using this set of hopefully non-fused ESTs, I still have substantial fusion > problems with the final annotation. Therefore, I reduced pred_flank to 100 and > turned on correct_est_fusion. However, correct_est_fusion leads to the > prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am > initially training both SNAP and Augustus using CEGMA genes and then > retraining based on the first round of annotation. Both rounds of annotation > yield the same low number (~5,000) of genes. It may also be worth mentioning > that the number of exons is also far lower when using correct_est_fusion > (~26,000 instead of ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that it > reduced the predicted number of genes by such a large margin. I am concerned > that I am using it incorrectly. Do you have any other suggestions for reducing > gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jxf1023 at gmail.com Thu Aug 8 19:11:37 2013 From: jxf1023 at gmail.com (Xiaofang Jiang) Date: Thu, 08 Aug 2013 21:11:37 -0400 Subject: [maker-devel] annotation comparison Message-ID: <520441C9.30407@gmail.com> Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang From carsonhh at gmail.com Fri Aug 23 10:38:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:38:51 -0400 Subject: [maker-devel] unknown error In-Reply-To: <602FA505-8A78-4006-8D8F-D3EC81A73FEC@uga.edu> Message-ID: Sorry for the slow response, but this message was caught by my spam filter so I never saw it till now. Which version of MAKER are you using? Have you tied the most recent online version/ --Carson On 7/30/13 3:13 PM, "Gaelen Burke" wrote: >Hello, >I have an error that occurs in my MAKER run for 52 of several thousand >scaffolds. The annotation of these scaffolds fails, even after 3 >re-tries. I also pulled out the failed sequences and started a run from >scratch, with the same result. Could anyone tell me what this error means >and how I could possibly fix it? >Thanks, >Gaelen Burke > >Here is the message that occurs: > >Processing transcripts into genes > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Calling translate without a seq argument! >STACK: Error::throw >STACK: Bio::Root::Root::throw >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Root/Root.pm:472 >STACK: Bio::Tools::CodonTable::translate >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Tools/CodonTable.pm >:411 >STACK: PhatHit_utils::_adjust >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit_utils >.pm: >880 >STACK: PhatHit_utils::adjust_start_stop >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit >_utils.pm:776 >STACK: maker::auto_annotator::load_transcript_struct >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/. >./lib/maker/auto_annotator.pm:1808 >STACK: maker::auto_annotator::group_transcripts >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib >/maker/auto_annotator.pm:2163 >STACK: maker::auto_annotator::annotate_genes >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/ma >ker/auto_annotator.pm:877 >STACK: Process::MpiChunk::_go >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:2159 >STACK: Process::MpiChunk::run >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:257 >STACK: Process::MpiTiers::run_all >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiTi >ers.pm:193 >STACK: /usr/local/maker/latest/bin/maker:276 >----------------------------------------------------------- > >FATAL ERROR >ERROR: Failed while clustering transcripts into genes for annotations!! > >ERROR: Chunk failed at level 20 >!! >FAILED CONTIG:scaffold_0040 > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Michael.Li3 at AGR.GC.CA Fri Aug 23 10:46:17 2013 From: Michael.Li3 at AGR.GC.CA (Li, Michael) Date: Fri, 23 Aug 2013 16:46:17 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: <229AF11430CC544B8987653593A750A92FB4AAFE@ONOTTAXES3.AGR.GC.CA> I'd be interested in knowing about this too. I'm writing an internship work report on this topic, so it would be great to hear about it. I personally use ParsEval to compare two sets of annotation. It seems pretty thorough, but it's still in its early stages of development. You can read a bit more about it here: https://github.com/standage/AEGeAn Cheers, Michael -----Original Message----- From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Xiaofang Jiang Sent: Thursday, August 08, 2013 9:12 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Aug 23 10:46:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:46:35 -0400 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> Message-ID: Sorry for the slow response. This was caught by my spam filter for some reason, so I am only seeing it now. Eval from WashU is the probably best tool for comparing two annotation sets. MAKER comes with a script that will convert MAKER's GFF3 output into Eval GTF input. If you are getting fewer genes than expected then you probably need to add more evidence. MAKER will be default reject genes that are not supported by either a protein or an EST. The keep_preds=1 option makes it keep everything even if there is no support. Usually people make the mistake of not providing sufficient protein evidence. For example, just using something like UniProt may not be sufficient, you may want to pick a couple of other species with annotated genomes (fruit fly for example or another mosquito species) and provide every protein from their genomes as evidence. Also if this is a newly sequenced genome, run CEGMA to see how complete the assembly is. If the assembly is 80% complete for example, you would only expect to retrieve 80% of expected genes. --Carson On 8/8/13 9:11 PM, "Xiaofang Jiang" wrote: >Dear Maker Developers, > >I am annotating a mosquitoes genome using Maker. I have two questions >regarding Maker annotation. > >1. Are there any scripts available to compare two sets of annotations? >We know about SOBA but were wondering if there is something more >comprehensive that you guys use. > >2. I am expecting around 13,000 genes, however maker only predicted >9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as >evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 >and that resulted 17,000 genes, it seems that shouldn't have happened. >So in order to get more genes, should I try change "single_length" to >100, and change "pred_flanks" to 100? > > >Best, > >Xiaofang > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From neil at cs.uky.edu Fri Aug 23 11:19:53 2013 From: neil at cs.uky.edu (Neil Moore) Date: Fri, 23 Aug 2013 13:19:53 -0400 Subject: [maker-devel] [PATCH] Corrupted exon table file when running FGeneSH Message-ID: <21015.39353.259641.458113@dirac.s-z.org> I encountered problems with FGeneSH failing on some contigs (small ones with few if any genes). Investigating the logs, I found that fgenesh was complaining about a "Corrupted exon table file"; it turns out that MAKER had omitted the header line from the exon table. I think this happened when there were predictions or evidence for that contig, but none of them contained introns; I haven't been able to verify that, though. The following patch corrected the problem and allowed FGeneSH to run, but I don't know the code well so perhaps there is a better way. --- lib/Widget/fgenesh.pm 2013-05-22 12:34:00.000000000 -0400 +++ /home/neil/fgenesh.pm 2013-08-08 14:41:40.585139335 -0400 @@ -562,18 +562,18 @@ push(@xdef, $l); } - return \@xdef if(!@$i_coors); - - foreach my $i (@$i_coors){ - my $i_b = ($i->[0] - $offset) + ($i_flank-1); - my $i_e = ($i->[1] - $offset) - ($i_flank-1); - - next if abs($i_b - $i_e) < 2*$i_flank; - next if abs($i_b - $i_e) < 25; - - my $l = "$i_b $i_e -1000"; - - push(@xdef, $l); + if(@$i_coors) { + foreach my $i (@$i_coors){ + my $i_b = ($i->[0] - $offset) + ($i_flank-1); + my $i_e = ($i->[1] - $offset) - ($i_flank-1); + + next if abs($i_b - $i_e) < 2*$i_flank; + next if abs($i_b - $i_e) < 25; + + my $l = "$i_b $i_e -1000"; + + push(@xdef, $l); + } } my $num = @xdef; -- Dr Neil Moore, neil at cs.uky.edu, neil at uky.edu, neil at s-z.org From dence at genetics.utah.edu Fri Aug 23 11:41:41 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 Aug 2013 17:41:41 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: Hi Xiaofang,? 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence.? 2. It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models.? To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito.? Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes.? What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species.? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] Sent: Thursday, August 08, 2013 7:11 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Fri Aug 23 15:05:18 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 23 Aug 2013 15:05:18 -0600 Subject: [maker-devel] annotation comparison In-Reply-To: References: <520441C9.30407@gmail.com> Message-ID: <6CAEFC94-1C2C-4CB1-886E-32C4A4D5EA36@genetics.utah.edu> Also Xiaofang, you mentioned that you have used SOBA, there is a command line version (called SOBAcl) that hasn't been advertised too well that does a lot more than the web version in terms of comparison and reporting. You can find it here: http://www.sequenceontology.org/resources/sobacl.html It can prepare tables and graphs on multiple GFF3 for feature counts, lengths, footprints etc. Mike Campbell here in the lab has a script to generate the data necessary for AED figures like this one and he's going to add it to maker/bin some time in the near future. B On Aug 23, 2013, at 11:41 AM, Daniel Ence wrote: > Hi Xiaofang, > > 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence. > > 2. > It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models. > > To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito. > > Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes. > > What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species. > > Thanks, > Daniel > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] > Sent: Thursday, August 08, 2013 7:11 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] annotation comparison > > Dear Maker Developers, > > I am annotating a mosquitoes genome using Maker. I have two questions > regarding Maker annotation. > > 1. Are there any scripts available to compare two sets of annotations? > We know about SOBA but were wondering if there is something more > comprehensive that you guys use. > > 2. I am expecting around 13,000 genes, however maker only predicted > 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as > evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 > and that resulted 17,000 genes, it seems that shouldn't have happened. > So in order to get more genes, should I try change "single_length" to > 100, and change "pred_flanks" to 100? > > > Best, > > Xiaofang > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Mon Aug 26 12:20:31 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Mon, 26 Aug 2013 13:20:31 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion Message-ID: Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a *de novo* assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Mon Aug 26 12:54:20 2013 From: jfierst at uoregon.edu (Janna Fierst) Date: Mon, 26 Aug 2013 11:54:20 -0700 Subject: [maker-devel] exon/intron boundaries Message-ID: Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 13:00:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:00:55 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: The correct_est_fusion option just clips UTR on overlapping genes. I suspect the real problem is setting pred_flank too low. If your lead in sequence to a gene is too short, ab initio predictors won't call it. So you are probably getting empty reports from SNAP/Augustus for the hint based predictions. Try increasing pred_flank to at least 150. Setting pred_flank too low will also limit how far MAKER will walk out along the edges initial alignments during the polishing step (exonerate). So setting it too low may also be causing you to lose some EST and protein alignments. --Carson From: Benjamin Rubin Date: Monday, August 26, 2013 2:20 PM To: Subject: [maker-devel] Unexpected results with correct_est_fusion Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a de novo assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 13:21:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:21:27 -0400 Subject: [maker-devel] exon/intron boundaries In-Reply-To: Message-ID: Are you getting gene fusions or just more exons? Gene fusions can be reduced by setting correct_est_fusion=1, or reducing pred_flank, although reducing pred_flank can cause other issues (but those generally only appear if setting the value below below 150). Also if you have the maximum intron size set to high (split_hit option), you may also be generating bridging alignments that make evidence align across distant paralogous genes as well (this can result in gene merging) You should also look at your results manually in a viewer like Apollo. Then see if the extra exons are supported by something such as protein alignments from another species. If this is the case, you may have a poorly annotated protein set that is being used as evidence that is carrying over it's erroneous exons into the species you are annotating. If the extra exons are supported by EST evidence, then perhaps you should try and rebuild the EST assembly (for example trinity has an option to use a Jarccardian similarity coefficient to avoid fusing transcripts). Another option, is to retrain SNAP or Augustus. MAKER does not actually produce any of the models itself (it is a pipeline not a predictor). The models are all generated using these other algorithms, MAKER just feeds them hints based on protein and transcript alignments, so making sure training is sufficient is important for those programs to produce their best models. Finally make sure your repeat database is sufficient, you may need to generate a species specific repeat library using something like RepeatModeler. Repeats can end up being included as extra exons in gene models because they may contain reading frames the do code for proteins (I.e. reverse transcriptases). If you have any questions on any of the above, just let us know. Thanks, Carson From: Janna Fierst Date: Monday, August 26, 2013 2:54 PM To: Subject: [maker-devel] exon/intron boundaries Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Tue Aug 27 08:59:50 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Tue, 27 Aug 2013 09:59:50 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: References: Message-ID: Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I > suspect the real problem is setting pred_flank too low. If your lead in > sequence to a gene is too short, ab initio predictors won't call it. So > you are probably getting empty reports from SNAP/Augustus for the hint > based predictions. Try increasing pred_flank to at least 150. Setting > pred_flank too low will also limit how far MAKER will walk out along the > edges initial alignments during the polishing step (exonerate). So setting > it too low may also be causing you to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein > sequence evidence from all seven of the other sequenced ant genomes and a > *de novo* assembled transcriptome as EST evidence. I assembled the > transcriptome using Trinity with the jaccard_clip option turned on to > reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I > still have substantial fusion problems with the final annotation. > Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. > However, correct_est_fusion leads to the prediction of a much smaller > number of genes (~5,000 instead of ~14,000). I am initially training both > SNAP and Augustus using CEGMA genes and then retraining based on the first > round of annotation. Both rounds of annotation yield the same low number > (~5,000) of genes. It may also be worth mentioning that the number of exons > is also far lower when using correct_est_fusion (~26,000 instead of > ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that > it reduced the predicted number of genes by such a large margin. I am > concerned that I am using it incorrectly. Do you have any other suggestions > for reducing gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4811 bytes Desc: not available URL: From carsonhh at gmail.com Wed Aug 28 07:09:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 Aug 2013 09:09:06 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: Could you pick one contig where the number of genes shift dramatically and upload that contig fasta together with your control files and any evidence datasets used to one of our servers (I'm going to send you connection details in a separate e-mail). I can then run with and without correct_est_fusion to see if there is anything unexpected going on. --Carson From: Benjamin Rubin Date: Tuesday, August 27, 2013 10:59 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I suspect > the real problem is setting pred_flank too low. If your lead in sequence to a > gene is too short, ab initio predictors won't call it. So you are probably > getting empty reports from SNAP/Augustus for the hint based predictions. Try > increasing pred_flank to at least 150. Setting pred_flank too low will also > limit how far MAKER will walk out along the edges initial alignments during > the polishing step (exonerate). So setting it too low may also be causing you > to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence > evidence from all seven of the other sequenced ant genomes and a de novo > assembled transcriptome as EST evidence. I assembled the transcriptome using > Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite > using this set of hopefully non-fused ESTs, I still have substantial fusion > problems with the final annotation. Therefore, I reduced pred_flank to 100 and > turned on correct_est_fusion. However, correct_est_fusion leads to the > prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am > initially training both SNAP and Augustus using CEGMA genes and then > retraining based on the first round of annotation. Both rounds of annotation > yield the same low number (~5,000) of genes. It may also be worth mentioning > that the number of exons is also far lower when using correct_est_fusion > (~26,000 instead of ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that it > reduced the predicted number of genes by such a large margin. I am concerned > that I am using it incorrectly. Do you have any other suggestions for reducing > gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jxf1023 at gmail.com Thu Aug 8 19:11:37 2013 From: jxf1023 at gmail.com (Xiaofang Jiang) Date: Thu, 08 Aug 2013 21:11:37 -0400 Subject: [maker-devel] annotation comparison Message-ID: <520441C9.30407@gmail.com> Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang From carsonhh at gmail.com Fri Aug 23 10:38:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:38:51 -0400 Subject: [maker-devel] unknown error In-Reply-To: <602FA505-8A78-4006-8D8F-D3EC81A73FEC@uga.edu> Message-ID: Sorry for the slow response, but this message was caught by my spam filter so I never saw it till now. Which version of MAKER are you using? Have you tied the most recent online version/ --Carson On 7/30/13 3:13 PM, "Gaelen Burke" wrote: >Hello, >I have an error that occurs in my MAKER run for 52 of several thousand >scaffolds. The annotation of these scaffolds fails, even after 3 >re-tries. I also pulled out the failed sequences and started a run from >scratch, with the same result. Could anyone tell me what this error means >and how I could possibly fix it? >Thanks, >Gaelen Burke > >Here is the message that occurs: > >Processing transcripts into genes > >------------- EXCEPTION: Bio::Root::Exception ------------- >MSG: Calling translate without a seq argument! >STACK: Error::throw >STACK: Bio::Root::Root::throw >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Root/Root.pm:472 >STACK: Bio::Tools::CodonTable::translate >/usr/local/perl/5.14.1/lib/site_perl/5.14.1/Bio/Tools/CodonTable.pm >:411 >STACK: PhatHit_utils::_adjust >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit_utils >.pm: >880 >STACK: PhatHit_utils::adjust_start_stop >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/PhatHit >_utils.pm:776 >STACK: maker::auto_annotator::load_transcript_struct >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/. >./lib/maker/auto_annotator.pm:1808 >STACK: maker::auto_annotator::group_transcripts >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib >/maker/auto_annotator.pm:2163 >STACK: maker::auto_annotator::annotate_genes >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/ma >ker/auto_annotator.pm:877 >STACK: Process::MpiChunk::_go >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:2159 >STACK: Process::MpiChunk::run >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiCh >unk. >pm:257 >STACK: Process::MpiTiers::run_all >/panfs/pstor.storage/rcclocal/zcluster/maker/2.10/bin/../lib/Process/MpiTi >ers.pm:193 >STACK: /usr/local/maker/latest/bin/maker:276 >----------------------------------------------------------- > >FATAL ERROR >ERROR: Failed while clustering transcripts into genes for annotations!! > >ERROR: Chunk failed at level 20 >!! >FAILED CONTIG:scaffold_0040 > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Michael.Li3 at AGR.GC.CA Fri Aug 23 10:46:17 2013 From: Michael.Li3 at AGR.GC.CA (Li, Michael) Date: Fri, 23 Aug 2013 16:46:17 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: <229AF11430CC544B8987653593A750A92FB4AAFE@ONOTTAXES3.AGR.GC.CA> I'd be interested in knowing about this too. I'm writing an internship work report on this topic, so it would be great to hear about it. I personally use ParsEval to compare two sets of annotation. It seems pretty thorough, but it's still in its early stages of development. You can read a bit more about it here: https://github.com/standage/AEGeAn Cheers, Michael -----Original Message----- From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Xiaofang Jiang Sent: Thursday, August 08, 2013 9:12 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri Aug 23 10:46:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 Aug 2013 12:46:35 -0400 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> Message-ID: Sorry for the slow response. This was caught by my spam filter for some reason, so I am only seeing it now. Eval from WashU is the probably best tool for comparing two annotation sets. MAKER comes with a script that will convert MAKER's GFF3 output into Eval GTF input. If you are getting fewer genes than expected then you probably need to add more evidence. MAKER will be default reject genes that are not supported by either a protein or an EST. The keep_preds=1 option makes it keep everything even if there is no support. Usually people make the mistake of not providing sufficient protein evidence. For example, just using something like UniProt may not be sufficient, you may want to pick a couple of other species with annotated genomes (fruit fly for example or another mosquito species) and provide every protein from their genomes as evidence. Also if this is a newly sequenced genome, run CEGMA to see how complete the assembly is. If the assembly is 80% complete for example, you would only expect to retrieve 80% of expected genes. --Carson On 8/8/13 9:11 PM, "Xiaofang Jiang" wrote: >Dear Maker Developers, > >I am annotating a mosquitoes genome using Maker. I have two questions >regarding Maker annotation. > >1. Are there any scripts available to compare two sets of annotations? >We know about SOBA but were wondering if there is something more >comprehensive that you guys use. > >2. I am expecting around 13,000 genes, however maker only predicted >9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as >evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 >and that resulted 17,000 genes, it seems that shouldn't have happened. >So in order to get more genes, should I try change "single_length" to >100, and change "pred_flanks" to 100? > > >Best, > >Xiaofang > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From neil at cs.uky.edu Fri Aug 23 11:19:53 2013 From: neil at cs.uky.edu (Neil Moore) Date: Fri, 23 Aug 2013 13:19:53 -0400 Subject: [maker-devel] [PATCH] Corrupted exon table file when running FGeneSH Message-ID: <21015.39353.259641.458113@dirac.s-z.org> I encountered problems with FGeneSH failing on some contigs (small ones with few if any genes). Investigating the logs, I found that fgenesh was complaining about a "Corrupted exon table file"; it turns out that MAKER had omitted the header line from the exon table. I think this happened when there were predictions or evidence for that contig, but none of them contained introns; I haven't been able to verify that, though. The following patch corrected the problem and allowed FGeneSH to run, but I don't know the code well so perhaps there is a better way. --- lib/Widget/fgenesh.pm 2013-05-22 12:34:00.000000000 -0400 +++ /home/neil/fgenesh.pm 2013-08-08 14:41:40.585139335 -0400 @@ -562,18 +562,18 @@ push(@xdef, $l); } - return \@xdef if(!@$i_coors); - - foreach my $i (@$i_coors){ - my $i_b = ($i->[0] - $offset) + ($i_flank-1); - my $i_e = ($i->[1] - $offset) - ($i_flank-1); - - next if abs($i_b - $i_e) < 2*$i_flank; - next if abs($i_b - $i_e) < 25; - - my $l = "$i_b $i_e -1000"; - - push(@xdef, $l); + if(@$i_coors) { + foreach my $i (@$i_coors){ + my $i_b = ($i->[0] - $offset) + ($i_flank-1); + my $i_e = ($i->[1] - $offset) - ($i_flank-1); + + next if abs($i_b - $i_e) < 2*$i_flank; + next if abs($i_b - $i_e) < 25; + + my $l = "$i_b $i_e -1000"; + + push(@xdef, $l); + } } my $num = @xdef; -- Dr Neil Moore, neil at cs.uky.edu, neil at uky.edu, neil at s-z.org From dence at genetics.utah.edu Fri Aug 23 11:41:41 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 Aug 2013 17:41:41 +0000 Subject: [maker-devel] annotation comparison In-Reply-To: <520441C9.30407@gmail.com> References: <520441C9.30407@gmail.com> Message-ID: Hi Xiaofang,? 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence.? 2. It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models.? To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito.? Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes.? What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species.? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] Sent: Thursday, August 08, 2013 7:11 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] annotation comparison Dear Maker Developers, I am annotating a mosquitoes genome using Maker. I have two questions regarding Maker annotation. 1. Are there any scripts available to compare two sets of annotations? We know about SOBA but were wondering if there is something more comprehensive that you guys use. 2. I am expecting around 13,000 genes, however maker only predicted 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 and that resulted 17,000 genes, it seems that shouldn't have happened. So in order to get more genes, should I try change "single_length" to 100, and change "pred_flanks" to 100? Best, Xiaofang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From barry.moore at genetics.utah.edu Fri Aug 23 15:05:18 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 23 Aug 2013 15:05:18 -0600 Subject: [maker-devel] annotation comparison In-Reply-To: References: <520441C9.30407@gmail.com> Message-ID: <6CAEFC94-1C2C-4CB1-886E-32C4A4D5EA36@genetics.utah.edu> Also Xiaofang, you mentioned that you have used SOBA, there is a command line version (called SOBAcl) that hasn't been advertised too well that does a lot more than the web version in terms of comparison and reporting. You can find it here: http://www.sequenceontology.org/resources/sobacl.html It can prepare tables and graphs on multiple GFF3 for feature counts, lengths, footprints etc. Mike Campbell here in the lab has a script to generate the data necessary for AED figures like this one and he's going to add it to maker/bin some time in the near future. B On Aug 23, 2013, at 11:41 AM, Daniel Ence wrote: > Hi Xiaofang, > > 1. People use various ad-hoc scripts for comparing genome genome annotations. If you look in the MAKER2 paper(Holt and Yandell, BMC Bioinformatics 2011), you'll see the AED score, which is some we use to compare the support from the evidence. > > 2. > It actually isn't surprising that you got that big of an increase in the number of genes when you turned "keep_preds" to 1. Ab-initio predictors give a great many false positives and "keep_preds=1" will turn all of those into gene models. > > To get more genes, you should probably train an additional gene predictor, like GeneMark and maybe augustus. I think that augustus should already have a config file for mosquito. > > Changing the "pred_flanks" will increase the number of genes if you think that you have many merged genes. > > What protein dataset did you use? We often use a set from a comprehensive protein database like uniprot in addition to proteins from closely related species. > > Thanks, > Daniel > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > ________________________________________ > From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Xiaofang Jiang [jxf1023 at gmail.com] > Sent: Thursday, August 08, 2013 7:11 PM > To: maker-devel at yandell-lab.org > Subject: [maker-devel] annotation comparison > > Dear Maker Developers, > > I am annotating a mosquitoes genome using Maker. I have two questions > regarding Maker annotation. > > 1. Are there any scripts available to compare two sets of annotations? > We know about SOBA but were wondering if there is something more > comprehensive that you guys use. > > 2. I am expecting around 13,000 genes, however maker only predicted > 9,000 genes. I used both the gff3 from cufflinks, protein, and ESTs as > evidence and SNAP as the ab inito predictor. I changed "keep_preds" to 1 > and that resulted 17,000 genes, it seems that shouldn't have happened. > So in order to get more genes, should I try change "single_length" to > 100, and change "pred_flanks" to 100? > > > Best, > > Xiaofang > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Mon Aug 26 12:20:31 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Mon, 26 Aug 2013 13:20:31 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion Message-ID: Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a *de novo* assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfierst at uoregon.edu Mon Aug 26 12:54:20 2013 From: jfierst at uoregon.edu (Janna Fierst) Date: Mon, 26 Aug 2013 11:54:20 -0700 Subject: [maker-devel] exon/intron boundaries Message-ID: Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 13:00:55 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:00:55 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: The correct_est_fusion option just clips UTR on overlapping genes. I suspect the real problem is setting pred_flank too low. If your lead in sequence to a gene is too short, ab initio predictors won't call it. So you are probably getting empty reports from SNAP/Augustus for the hint based predictions. Try increasing pred_flank to at least 150. Setting pred_flank too low will also limit how far MAKER will walk out along the edges initial alignments during the polishing step (exonerate). So setting it too low may also be causing you to lose some EST and protein alignments. --Carson From: Benjamin Rubin Date: Monday, August 26, 2013 2:20 PM To: Subject: [maker-devel] Unexpected results with correct_est_fusion Hello developers, I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence evidence from all seven of the other sequenced ant genomes and a de novo assembled transcriptome as EST evidence. I assembled the transcriptome using Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I still have substantial fusion problems with the final annotation. Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. However, correct_est_fusion leads to the prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am initially training both SNAP and Augustus using CEGMA genes and then retraining based on the first round of annotation. Both rounds of annotation yield the same low number (~5,000) of genes. It may also be worth mentioning that the number of exons is also far lower when using correct_est_fusion (~26,000 instead of ~90,000). Is this the expected behavior of correct_est_fusion? I was surprised that it reduced the predicted number of genes by such a large margin. I am concerned that I am using it incorrectly. Do you have any other suggestions for reducing gene merging? Thanks, Ben -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Aug 26 13:21:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Aug 2013 15:21:27 -0400 Subject: [maker-devel] exon/intron boundaries In-Reply-To: Message-ID: Are you getting gene fusions or just more exons? Gene fusions can be reduced by setting correct_est_fusion=1, or reducing pred_flank, although reducing pred_flank can cause other issues (but those generally only appear if setting the value below below 150). Also if you have the maximum intron size set to high (split_hit option), you may also be generating bridging alignments that make evidence align across distant paralogous genes as well (this can result in gene merging) You should also look at your results manually in a viewer like Apollo. Then see if the extra exons are supported by something such as protein alignments from another species. If this is the case, you may have a poorly annotated protein set that is being used as evidence that is carrying over it's erroneous exons into the species you are annotating. If the extra exons are supported by EST evidence, then perhaps you should try and rebuild the EST assembly (for example trinity has an option to use a Jarccardian similarity coefficient to avoid fusing transcripts). Another option, is to retrain SNAP or Augustus. MAKER does not actually produce any of the models itself (it is a pipeline not a predictor). The models are all generated using these other algorithms, MAKER just feeds them hints based on protein and transcript alignments, so making sure training is sufficient is important for those programs to produce their best models. Finally make sure your repeat database is sufficient, you may need to generate a species specific repeat library using something like RepeatModeler. Repeats can end up being included as extra exons in gene models because they may contain reading frames the do code for proteins (I.e. reverse transcriptases). If you have any questions on any of the above, just let us know. Thanks, Carson From: Janna Fierst Date: Monday, August 26, 2013 2:54 PM To: Subject: [maker-devel] exon/intron boundaries Hi, I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From brubin at fieldmuseum.org Tue Aug 27 08:59:50 2013 From: brubin at fieldmuseum.org (Benjamin Rubin) Date: Tue, 27 Aug 2013 09:59:50 -0500 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: References: Message-ID: Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I > suspect the real problem is setting pred_flank too low. If your lead in > sequence to a gene is too short, ab initio predictors won't call it. So > you are probably getting empty reports from SNAP/Augustus for the hint > based predictions. Try increasing pred_flank to at least 150. Setting > pred_flank too low will also limit how far MAKER will walk out along the > edges initial alignments during the polishing step (exonerate). So setting > it too low may also be causing you to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein > sequence evidence from all seven of the other sequenced ant genomes and a > *de novo* assembled transcriptome as EST evidence. I assembled the > transcriptome using Trinity with the jaccard_clip option turned on to > reduce gene fusions. Despite using this set of hopefully non-fused ESTs, I > still have substantial fusion problems with the final annotation. > Therefore, I reduced pred_flank to 100 and turned on correct_est_fusion. > However, correct_est_fusion leads to the prediction of a much smaller > number of genes (~5,000 instead of ~14,000). I am initially training both > SNAP and Augustus using CEGMA genes and then retraining based on the first > round of annotation. Both rounds of annotation yield the same low number > (~5,000) of genes. It may also be worth mentioning that the number of exons > is also far lower when using correct_est_fusion (~26,000 instead of > ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that > it reduced the predicted number of genes by such a large margin. I am > concerned that I am using it incorrectly. Do you have any other suggestions > for reducing gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4811 bytes Desc: not available URL: From carsonhh at gmail.com Wed Aug 28 07:09:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 Aug 2013 09:09:06 -0400 Subject: [maker-devel] Unexpected results with correct_est_fusion In-Reply-To: Message-ID: Could you pick one contig where the number of genes shift dramatically and upload that contig fasta together with your control files and any evidence datasets used to one of our servers (I'm going to send you connection details in a separate e-mail). I can then run with and without correct_est_fusion to see if there is anything unexpected going on. --Carson From: Benjamin Rubin Date: Tuesday, August 27, 2013 10:59 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Unexpected results with correct_est_fusion Hi Carson, I increased pred_flank to 200 and reran MAKER with correct_est_fusion, but I still only get ~5,000 genes (5,082 instead of the 5,020 with pred_flank at 100). This is using only the first round with SNAP and Augustus trained on the CEGMA genes. Is there anything else that I might be doing wrong? I have attached my control file in case that could be useful. Thanks for the help! Ben On Mon, Aug 26, 2013 at 2:00 PM, Carson Holt wrote: > The correct_est_fusion option just clips UTR on overlapping genes. I suspect > the real problem is setting pred_flank too low. If your lead in sequence to a > gene is too short, ab initio predictors won't call it. So you are probably > getting empty reports from SNAP/Augustus for the hint based predictions. Try > increasing pred_flank to at least 150. Setting pred_flank too low will also > limit how far MAKER will walk out along the edges initial alignments during > the polishing step (exonerate). So setting it too low may also be causing you > to lose some EST and protein alignments. > > --Carson > > > From: Benjamin Rubin > Date: Monday, August 26, 2013 2:20 PM > To: > Subject: [maker-devel] Unexpected results with correct_est_fusion > > Hello developers, > > I am using MAKER 2.28 to annotate an ant genome. I provide protein sequence > evidence from all seven of the other sequenced ant genomes and a de novo > assembled transcriptome as EST evidence. I assembled the transcriptome using > Trinity with the jaccard_clip option turned on to reduce gene fusions. Despite > using this set of hopefully non-fused ESTs, I still have substantial fusion > problems with the final annotation. Therefore, I reduced pred_flank to 100 and > turned on correct_est_fusion. However, correct_est_fusion leads to the > prediction of a much smaller number of genes (~5,000 instead of ~14,000). I am > initially training both SNAP and Augustus using CEGMA genes and then > retraining based on the first round of annotation. Both rounds of annotation > yield the same low number (~5,000) of genes. It may also be worth mentioning > that the number of exons is also far lower when using correct_est_fusion > (~26,000 instead of ~90,000). > > Is this the expected behavior of correct_est_fusion? I was surprised that it > reduced the predicted number of genes by such a large margin. I am concerned > that I am using it incorrectly. Do you have any other suggestions for reducing > gene merging? > > Thanks, > Ben > > -- > _____________________________________________________ > Benjamin ER Rubin > PhD Candidate > Committee on Evolutionary Biology > University of Chicago > http://www.moreaulab.org/Benjamin_Rubin.html > > Division of Insects > Zoology Department > Field Museum of Natural History > 1400 South Lake Shore Drive > Chicago, IL 60605 > USA > Office: (312) 665-7776 > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- _____________________________________________________ Benjamin ER Rubin PhD Candidate Committee on Evolutionary Biology University of Chicago http://www.moreaulab.org/Benjamin_Rubin.html Division of Insects Zoology Department Field Museum of Natural History 1400 South Lake Shore Drive Chicago, IL 60605 USA Office: (312) 665-7776 -------------- next part -------------- An HTML attachment was scrubbed... URL: