[maker-devel] How to improve the result of Maker

陈文博 chenwenbo1020 at gmail.com
Sat Jan 31 08:54:28 MST 2015


>
>
> There are two possibilities. Given how different the snap and augustus
> models are from one another, this would suggest they have not been trained
> appropriately (for example if you are picking another related organisms
> parameter file rather than training these programs, there are several
> assumptions that are being made that can actually make such an approach
> almost worse than just picking a parameter file at random). But more likely
> the evidence supported exon breaks the reading frame of the model.  This
> usually indicates that you have an assembly error (possibly issues with
> homopolymers).  No amount of evidence support will allow you to call an
> exon that generates a mis-sense causing frameshift, so the predictors do
> the next most reasonable thing - they drop the exon if another model is
> tenable.  More concerning would be the mRNA-seq alignments near the 3’ end
> of the gene call.  The structure suggests significant capture of background
> transcription with the mRNA-seq reads (long UTRs with weird mini-introns).
> I would suggest not using cufflinks in this case.  You should probably go
> with an assembly based approach of mRNA-seq reads instead.  I would suggest
> using trinity. It will reduce sensitivity but greatly increase evidence
> specificity which is where you need the most improvement based on these
> images.  I would also suggest using the jaccard_clip option with trinity.
>
> I would further suggest looking at the model in question using apollo, and
> manually adding the exon (click and drag it into the model).  You can
> examine the reading frame after adding the exon and see if it is in fact a
> frameshift assembly error.  If it’s a homopolymer derived frameshift, then
> you can expect a lot more of these throughout your assembly.
>

I drag the exon into the model, there is a stop codon in it, it causes the
region behind it become UTR, here:
[image: 内嵌图片 1]
the question exon was pointed by red arrow. But the uppermost evidence is
the completed EST from NCBI, and it contains start and stop codon. Then I
noticed the 5' boundary of the 2nd codon in model is not the same as EST,
so it makes frameshift, and cause the stop codon in the exon pointed by red
arrow. The first exon should not be CDS, as there would be a start codon in
2nd exon if its 5' boundary is predicted correctly. Would
"always_complete=1" fix it?

I will try to use trinity.

>
> Also I do not see any protein alignments here?  MAKER cannot work on
> transcript evidence alone.  You need to provide the full proteome of at
> least two other species (they don’t have to be that closely related, but
> closer is better).  Protein alignments will also help you better interpret
> the coding status of exons supported by mRNA-seq. For example in the second
> image, you would expect protein evidence to support all the coding exons
> but not the UTR exons which would remove any doubt as to whether an exon is
> really UTR or not.
>

I did use 3 sources of protein evidence, one is proteome from related
species, and one is proteome from fruitfly, and the last one is Swiss-prot.

Thank you very much!

Best regards,
Wenbo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150131/510b110e/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 10308 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150131/510b110e/attachment-0003.png>


More information about the maker-devel mailing list