[maker-devel] why no prediction
Carson Holt
carsonhh at gmail.com
Tue May 12 20:09:56 MDT 2015
Hi Wenbo,
You will actually get more gene calls from gene predictors than there are genes (often orders of magnitude more) because workable ORFs are common in a genome. So having a single exon ORF predicted is not really that noteworthy. You can expect those kind of predictions to outnumber the true gene count by as much as 10 to 1 in some genomes.
The problem with the region you are showing is that it doesn’t look like a gene. Even without a more detailed look at the coordinates and evidence overlap, the image lacks the structure for evidence and prediction concordance than would be expected in a genic region. Without some form of additional evidence like a good protein match, it is just too much like a lot of spurious overlap regions that you would expect to find randomly throughout a genome. Given this, there is just not enough support to promote the region to being a gene. The predictions are still there in the output for reference purposes, but will not be promoted to gene because the evidence support is insufficient.
Looking at this region, there are not good gene predictions from snap, augustus, or your pred_gff either (poor concordance). The heavy discordance among the different gene predictors suggests, they have not been sufficiently trained. One thing that can affect evidence alignment and gene predictor performance is insufficient masking of repeat elements. You may need to spend some time building a species specific repeat database using tools like RepeatModeler. Other issues that will have an affect are stretches of N’s in the sequence. You will get poor evidence alignments and predictions in what appears to be a large contig if there isn’t enough continuous usable sequence. I mention all these factors, because the region in question looks spurious and unordered. Lack of concordance in clustering patters generally means there are other structural issues with the dataset being used.
I’ve attached an image below to give an example. Notice how in regions with genes the different evidence types build on each other and have remarkable concordance (SNAP and Augustus choose very similar exon patterns for example). Regions without genes still have aligned evidence from Trinity assembled mRNA-seq and ab initio gene predictors, but they are not concordant, are more spurious in nature, and can be found on both strands. Simple overlap is insufficient to generate a gene call. You have to consider the totality of evidence.
Thanks,
Carson
> On May 12, 2015, at 7:06 PM, 陈文博 <chenwenbo1020 at gmail.com> wrote:
>
> Thank you for the help
>
> I double checked the "EST alignment", and sorry that the darkest pink is the assembled transcript using Trinity, not EST. The splice sits is GT/AG. The cufflinks and Trinity result suggest that this region could transcript. There is a intact ORF in the prediction given by Ausgustus. Maybe this region should be a real gene, however it was not predicted by Maker.
>
> Thanks,
> Wenbo
>
>
>
> 2015-05-12 20:31 GMT-04:00 Mark Yandell <myandell at genetics.utah.edu <mailto:myandell at genetics.utah.edu>>:
> and finally check the splice sites for the EST splice are they valid GT/AG or AT/AC?
>
>
> On May 12, 2015, at 4:18 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>
> > Also protein evidence will only be considered as support if it is in the same reading frame as the ab initio prediction. Complete mismatch of reading frames usually suggests a repeat like region.
> >
> > —Carson
> >
> >
> >> On May 12, 2015, at 4:16 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
> >>
> >> The structure of the evidence appears to suggest a spurious prediction randomly overlapped by a spurious EST alignment. You would need at least protein evidence overlap to make it believable. There is heavy discordance among the gene predictors. Also the fact that the gene would be 90% plus UTR if the EST does in fact represent true expression is a big factor. More likely it’s a pseudogene or semi repetitive region. Not making this a gene was the right call.
> >>
> >> —Carson
> >>
> >>
> >>
> >>> On May 12, 2015, at 3:56 PM, 陈文博 <chenwenbo1020 at gmail.com <mailto:chenwenbo1020 at gmail.com>> wrote:
> >>>
> >>> Hi guys,
> >>>
> >>> I come with a wired case that one region in genome has evidence, but no gene prediction generated. Here are the detail.
> >>>
> >>>
> >>> <image.png>
> >>> color means:
> >>> pink: Augustus
> >>> light green: SNAP
> >>> dark pink: pred_gff
> >>> light yellow: cufflinks
> >>> darkest pink: EST alignment
> >>> dark yellow: protein alignment
> >>>
> >>> In the region marked by red frame, there are predictions from Augustus and pred_gff, also evidences from cufflinks, why there is no gene model generated? I could find the gene model in the "XXXX.all.maker.non_overlapping_ab_initio.transcripts.fasta" file. It is wired because it did have evidence supported.
> >>>
> >>> Could anyone know the reason?
> >>>
> >>> Thanks very much!
> >>>
> >>> Best,
> >>> Wenbo
> >>> _______________________________________________
> >>> maker-devel mailing list
> >>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
> >>
> >
> >
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150512/a916b3b1/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 51295 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150512/a916b3b1/attachment-0003.png>
More information about the maker-devel
mailing list