<div dir="ltr">Hi,<br><div class="gmail_quote"><div dir="ltr"><div class="gmail_quote"><div class="gmail_extra"><br>thanks for your reply- I have been going through our annotations in detail and trying different parameter sets, and I think I have identified what is going on but I'm not sure how to set the MAKER2 parameters for our situation. We are working with a species of Caenorhabditid worm and there are long gene-dense blocks that are being incorrectly annotated as large single genes instead of several smaller closely spaced genes. The protein alignments (tblastx and protein2genome) show very clearly where the exon/intron boundaries are and in most cases agree with the augustus predictions. The assembled cufflinks output (through blastn, est2genome and est_gff:cufflinks) does not agree in some locations; I think this may be because in some cases the UTR nearly overlaps adjacent genes. <br>
<br>I have included a screenshot of an annotated region viewed in apollo to try to show this. The large gene in the middle is actually 7 different genes that are extremely close together and MAKER2 is collapsing them into a single gene. I tried running without any RNASeq/cufflinks data and MAKER2 annotates the region as two genes instead of one, but I can't get it to recognize the 7 as different genes. I have retrained SNAP but we have not been able to successfully train Augustus, we are currently using the default caenhorhabditis species model. I also included a species specific repeat library. I tried setting correct_est_fusion=1 and reducing pred_flank but these changes appear to really alter the annotations and we end up annotating almost nothing. I also tried setting est2genome=0 to decrease the influence of the cufflinks assembly but it didn't appear to help. There are some very large introns in these genes so I haven't tried yet decreasing the maximum intron size because I'm concerned this may generate too many split genes instead of our current merged gene problem. Thanks for your help, any advice is greatly appreciated! -Janna Fierst <br>
<div class="im">
<br><br><div class="gmail_quote">On Mon, Aug 26, 2013 at 12:21 PM, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word"><div>Are you getting gene fusions or just more exons? Gene fusions can be reduced by setting correct_est_fusion=1, or reducing pred_flank, although reducing pred_flank can cause other issues (but those generally only appear if setting the value below below 150). Also if you have the maximum intron size set to high (split_hit option), you may also be generating bridging alignments that make evidence align across distant paralogous genes as well (this can result in gene merging)</div>
<div><br></div><div>You should also look at your results manually in a viewer like Apollo. Then see if the extra exons are supported by something such as protein alignments from another species. If this is the case, you may have a poorly annotated protein set that is being used as evidence that is carrying over it's erroneous exons into the species you are annotating. If the extra exons are supported by EST evidence, then perhaps you should try and rebuild the EST assembly (for example trinity has an option to use a Jarccardian similarity coefficient to avoid fusing transcripts).</div>
<div><br></div><div>Another option, is to retrain SNAP or Augustus. MAKER does not actually produce any of the models itself (it is a pipeline not a predictor). The models are all generated using these other algorithms, MAKER just feeds them hints based on protein and transcript alignments, so making sure training is sufficient is important for those programs to produce their best models.</div>
<div><br></div><div>Finally make sure your repeat database is sufficient, you may need to generate a species specific repeat library using something like RepeatModeler. Repeats can end up being included as extra exons in gene models because they may contain reading frames the do code for proteins (I.e. reverse transcriptases).</div>
<div><br></div><div>If you have any questions on any of the above, just let us know.</div><div><br></div><div>Thanks,</div><div>Carson</div><div><br></div><div><br></div><span><div style="border-right:medium none;padding-right:0in;padding-left:0in;padding-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font-family:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left:medium none">
<span style="font-weight:bold">From: </span> Janna Fierst <<a href="mailto:jfierst@uoregon.edu" target="_blank">jfierst@uoregon.edu</a>><br><span style="font-weight:bold">Date: </span> Monday, August 26, 2013 2:54 PM<br>
<span style="font-weight:bold">To: </span> <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> [maker-devel] exon/intron boundaries<br>
</div><div><div><br></div><div dir="ltr">Hi,<br><br>I am using MAKER 2.28 to annotate a Caenorhabditid worm genome, and the initial results appear fairly good but we seem to be be annotating too many exons for multiple genes. I was wondering which parameters should be tuned to change the threshold for exon/intron boundaries? Thanks for your help -Janna Fierst<br>
</div></div>
_______________________________________________
maker-devel mailing list
<a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a>
<a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>
</span></div>
</blockquote></div><br></div></div><div><div>
</div></div></div><br></div>
</div><br></div>