[maker-devel] How to improve the result of Maker

Carson Holt carsonhh at gmail.com
Mon Feb 2 09:40:06 MST 2015


When you add a new exon, apollo will always recalculate the reading frame to take the longest ORF, so even though the first exon might not be the same, the other exons don’t allow for a longer ORF either.  So the ORF you got was the longest possible given any combination of all exons (even if the first exon would have been made as UTR).  So that confirms my suspicion that that particular exon was ignored because it breaks any possible reading frame.  It likely contains an assembly error.

—Carson


> On Jan 31, 2015, at 8:54 AM, 陈文博 <chenwenbo1020 at gmail.com> wrote:
> 
> 
> There are two possibilities. Given how different the snap and augustus models are from one another, this would suggest they have not been trained appropriately (for example if you are picking another related organisms parameter file rather than training these programs, there are several assumptions that are being made that can actually make such an approach almost worse than just picking a parameter file at random). But more likely the evidence supported exon breaks the reading frame of the model.  This usually indicates that you have an assembly error (possibly issues with homopolymers).  No amount of evidence support will allow you to call an exon that generates a mis-sense causing frameshift, so the predictors do the next most reasonable thing - they drop the exon if another model is tenable.  More concerning would be the mRNA-seq alignments near the 3’ end of the gene call.  The structure suggests significant capture of background transcription with the mRNA-seq reads (long UTRs with weird mini-introns).  I would suggest not using cufflinks in this case.  You should probably go with an assembly based approach of mRNA-seq reads instead.  I would suggest using trinity. It will reduce sensitivity but greatly increase evidence specificity which is where you need the most improvement based on these images.  I would also suggest using the jaccard_clip option with trinity.
> 
> I would further suggest looking at the model in question using apollo, and manually adding the exon (click and drag it into the model).  You can examine the reading frame after adding the exon and see if it is in fact a frameshift assembly error.  If it’s a homopolymer derived frameshift, then you can expect a lot more of these throughout your assembly.
> 
> I drag the exon into the model, there is a stop codon in it, it causes the region behind it become UTR, here:
> <image.png>
> the question exon was pointed by red arrow. But the uppermost evidence is the completed EST from NCBI, and it contains start and stop codon. Then I noticed the 5' boundary of the 2nd codon in model is not the same as EST, so it makes frameshift, and cause the stop codon in the exon pointed by red arrow. The first exon should not be CDS, as there would be a start codon in 2nd exon if its 5' boundary is predicted correctly. Would "always_complete=1" fix it?
> 
> I will try to use trinity.
> 
> Also I do not see any protein alignments here?  MAKER cannot work on transcript evidence alone.  You need to provide the full proteome of at least two other species (they don’t have to be that closely related, but closer is better).  Protein alignments will also help you better interpret the coding status of exons supported by mRNA-seq. For example in the second image, you would expect protein evidence to support all the coding exons but not the UTR exons which would remove any doubt as to whether an exon is really UTR or not.
> 
> I did use 3 sources of protein evidence, one is proteome from related species, and one is proteome from fruitfly, and the last one is Swiss-prot.
> 
> Thank you very much!
> 
> Best regards,
> Wenbo
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150202/41b3ec02/attachment-0002.html>


More information about the maker-devel mailing list