<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div>See below —> </div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra">I have join "Maker-devel" google group, but I don't known why I can't reply a topic and create a new topic. Is there some limitation? </div></div></div></blockquote><div><br class=""></div><div>The google site is just a searchable archive of MAKER related e-mails. The actual conversations occur through the MAKER mailing list —> <a href="http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org" class="">http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org</a></div>E-mails sent to the list will be automatically archived on google.<br class=""><br class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra">I have finish genome annotation with Maker. I use SNAP and Augustus in Maker. I have some questions, could you help me?</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">When gene finders have prediction at the same location, maker would choose the best prediction as final output, right? but if the prediction doesn't match evidence very much, how maker will synthesize the prediction with evidence? My knowledge about maker's action is as follow, I'm not sure whether it is right:</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">assume that there is an exon existing in evidence but not in prediction, if the exon locate at the end of prediction, it will be output as UTR, but if the exon locate inside prediction, it will be ignored, and not be output, right?</div></div></div></blockquote><div><br class=""></div><div>No.  MAKER uses the introns and exons in the evidence alignments to provide hints to the gene predictors.  Hints increases the probability scores of the HMM models by increasing the likelihood of the exon or intron state wherever it overlaps the evidence alignment.  This process bumps up the likelihood values for models that better match the evidence alignments resulting in better models than SNAP and Augustus produce on their own without hints.  Note that models are still governed by the same constraints of what constitutes an open reading frame and a splice site regardless of evidence alignments. This means that no amount of evidence based hints can overcome an assembly error.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra">for example:</div><div class="gmail_extra"><span id="cid:ii_14b3afb6302f9183"><image.png></span><br class=""></div><div class="gmail_extra">the exon pointed by red arrow. all evidences contain this exon, but it was missed in the final output. </div></div></div></blockquote><div><br class=""></div><div>There are two possibilities. Given how different the snap and augustus models are from one another, this would suggest they have not been trained appropriately (for example if you are picking another related organisms parameter file rather than training these programs, there are several assumptions that are being made that can actually make such an approach almost worse than just picking a parameter file at random). But more likely the evidence supported exon breaks the reading frame of the model.  This usually indicates that you have an assembly error (possibly issues with homopolymers).  No amount of evidence support will allow you to call an exon that generates a mis-sense causing frameshift, so the predictors do the next most reasonable thing - they drop the exon if another model is tenable.  More concerning would be the mRNA-seq alignments near the 3’ end of the gene call.  The structure suggests significant capture of background transcription with the mRNA-seq reads (long UTRs with weird mini-introns).  I would suggest not using cufflinks in this case.  You should probably go with an assembly based approach of mRNA-seq reads instead.  I would suggest using trinity. It will reduce sensitivity but greatly increase evidence specificity which is where you need the most improvement based on these images.  I would also suggest using the jaccard_clip option with trinity.</div><div><br class=""></div><div>I would further suggest looking at the model in question using apollo, and manually adding the exon (click and drag it into the model).  You can examine the reading frame after adding the exon and see if it is in fact a frameshift assembly error.  If it’s a homopolymer derived frameshift, then you can expect a lot more of these throughout your assembly.</div><div><br class=""></div><div>Also I do not see any protein alignments here?  MAKER cannot work on transcript evidence alone.  You need to provide the full proteome of at least two other species (they don’t have to be that closely related, but closer is better).  Protein alignments will also help you better interpret the coding status of exons supported by mRNA-seq. For example in the second image, you would expect protein evidence to support all the coding exons but not the UTR exons which would remove any doubt as to whether an exon is really UTR or not.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra">In this example, long UTR is another issue, is it non-coding RNA?</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">I have another example:</div><div class="gmail_extra"><span id="cid:ii_14b3af4f370974c6"><image.png></span><br class=""></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">The yellow was evidencs from cufflinks. The final output choose the prediction from Augustus, but the last two exon was annotated as UTR, I thought UTR should be continuous, and should not contain intron.</div></div></div></blockquote><div><br class=""></div><div>Actually UTR is not expected to be continuous and without introns.  In fact the majority of alternate splicing events occur in the 5’ UTR (not in the CDS) and 5’ UTR commonly contain introns (just as we see here).  This makes evolutionary sense.  Alternatively spiced 5’ UTR allows for differential and tissue specific control of the exact same protein by swapping out the upstream regulatory sequence. Alternate splicing of the 3’ UTR on the other hand is less common (it’s involved in nonsense mediated decay and not so much in regulation of expression), but introns in the 3’ UTR are still not uncommon.  The mRNA-seq alignments suggests that those exons are transcribed, so unless there is an assembly error causing a framefhift in the CDS and an early stop codon, the 3’ UTR would be correct. If you had protein alignments from another species here, then you could see which exons they support as being coding exons.</div><div><br class=""></div>Thanks,</div><div>Carson</div></body></html>