<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Hi Janna,</div><div><br></div><div>There are a couple of things to do.  Download the maker-2.29p-beta from the lab website.  This includes some changes made to improve the performance of correct_est_fusion.  Whenever using the correct_est_fusion option, this also reduces the influence of ESTs on annotation.  So normally you would need to increase your protein dataset, but given that you have supplied 3 nematode species and all of UniProt already you should probably be fine. But there is one last thing you can do.  Instead of using cufflinks, try using trinity to assemble the ESTs.  There is a Jaccard clip option that reducing merging caused by overlapping UTR.  Between the trinity and correct_est_fusion you should be able to really reduce the effect of those ESTs.</div><div><br></div><div>If those changes don't work there is one last option.  If you take the MAKER results and filter them for SNAP and Augustus ab initio results (match/match_part in the GFF3), then you can pass those in to the pred_gff options.  Then turn snaphmm and augustus_species off in the control files.  Basically what this will do is turn MAKER's hint based prediction off and force it to filter the ab intio results and select directly models from there.  Since the merging is being caused by bad hints (merged transcripts from mRNAseq) this would reduce that effect.  You will still need correct_est_fusion=1 though to trim UTR coming from the merged transcripts, because even though MAEKR can't process hints to rerun SNAP and Augustus this way, it will try and add UTR using the EST evidence.</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Janna Fierst <<a href="mailto:jfierst@uoregon.edu">jfierst@uoregon.edu</a>><br><span style="font-weight:bold">Date: </span> Friday, October 4, 2013 1:06 PM<br><span style="font-weight:bold">To: </span> <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> [maker-devel] Fwd:  exon/intron boundaries<br></div><div><br></div><div dir="ltr">Hi,<br><div class="gmail_quote"><div class="gmail_extra"><br>thanks for your reply- I have been going through our annotations in detail and trying different parameter sets, and I think I have identified what is going on but I'm not sure how to set the MAKER2 parameters for our situation. We are working with a species of Caenorhabditid worm and there are long gene-dense blocks that are being incorrectly annotated as large single genes instead of several smaller closely spaced genes. The protein alignments (tblastx and protein2genome) show very clearly where the exon/intron boundaries are and in most cases agree with the augustus predictions. The assembled cufflinks output (through blastn, est2genome and est_gff:cufflinks) does not agree in some locations; I think this may be because in some cases the UTR nearly overlaps adjacent genes. <br><br>I have included a screenshot of an annotated region viewed in apollo to try to show this. The large gene in the middle is actually 7 different genes that are extremely close together and MAKER2 is collapsing them into a single gene. I tried running without any RNASeq/cufflinks data and MAKER2 annotates the region as two genes instead of one, but I can't get it to recognize the 7 as different genes. I have retrained SNAP but we have not been able to successfully train Augustus, we are currently using the default caenhorhabditis species model. I also included a species specific repeat library. I tried setting correct_est_fusion=1 and reducing pred_flank but these changes appear to really alter the annotations and we end up annotating almost nothing. I also tried setting est2genome=0 to decrease the influence of the cufflinks assembly but it didn't appear to help. There are some very large introns in these genes so I haven't tried yet decreasing the maximum intron size because I'm concerned this may generate too many split genes instead of our current merged gene problem. Thanks for your help, any advice is greatly appreciated! -Janna Fierst <br><br><br><div class="gmail_quote">On Mon, Aug 26, 2013 at 12:21 PM, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word"><div>Are you getting gene fusions or just more exons?  Gene fusions can be reduced by setting correct_est_fusion=1, or reducing pred_flank, although reducing pred_flank can cause other issues (but those generally only appear if setting the value below below 150).  Also if you have the maximum intron size set to high (split_hit option), you may also be generating bridging alignments that make evidence align across distant paralogous genes as well (this can result in gene merging)</div><div><br></div><div>You should also look at your results manually in a viewer like Apollo.  Then see if the extra exons are supported by something such as protein alignments from another species.  If this is the case, you may have a poorly annotated protein set that is being used as evidence that is carrying over it's erroneous exons into the species you are annotating.  If the extra  exons are supported by EST evidence, then perhaps you should try and rebuild the EST assembly (for example trinity has an option to use a Jarccardian similarity coefficient to avoid fusing transcripts).</div><div><br></div><div>Another option, is to retrain SNAP or Augustus.  MAKER does not actually produce any of the models itself (it is a pipeline not a predictor).  The models are all generated using these other algorithms, MAKER just feeds them hints based on protein and transcript alignments, so making sure training is sufficient is important for those programs to produce their best models.</div><div><br></div><div>Finally make sure your repeat database is sufficient, you may need to generate a species specific repeat library using something like RepeatModeler.  Repeats can end up being included as extra exons in gene models because they may contain reading frames the do code for proteins (I.e. reverse transcriptases).</div><div><br></div><div>If you have any questions on any of the above, just let us know.</div><div><br></div><div>Thanks,</div><div>Carson</div><div><br></div><div><br></div><span><div style="border-right:medium none;padding-right:0in;padding-left:0in;padding-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font-family:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left:medium none"><span style="font-weight:bold">From: </span> Janna Fierst <<a href="mailto:jfierst@uoregon.edu" target="_blank">jfierst@uoregon.edu</a>><br><span style="font-weight:bold">Date: </span> Monday, August 26, 2013 2:54 PM<br><span style="font-weight:bold">To: </span> <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> [maker-devel] exon/intron boundaries<br></div><div><div><br></div><div dir="ltr">Hi,<br><br>I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst<br></div></div>

_______________________________________________

maker-devel mailing list

<a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></span></div></blockquote></div><br></div><div class="HOEnZb"><div class="h5"></div></div></div><br></div>

_______________________________________________

maker-devel mailing list

<a href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a>

<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>

</span></body></html>