[maker-devel] SOBA statistics of Maker annotation
Carson Holt
carsonhh at gmail.com
Sun Feb 19 23:05:09 MST 2017
IN GFF3 the CDS and UTR lengths are actually the merge of all CDSs or UTR features, but SOBA is reporting each part individually which may be causing your confusion. This is because SOBA reports per feature statistics and not merged feature statistics.
CDS’s do not have to take up entire exons. For example start/stop codons may cross splice sites and be split across exons (very common). The result is that each part of the split CDS becomes a separate feature. As a result SOBA will treat each one separately. So a single bp CDS here is not abnormal, since the remaining part of the CDS continues on the next exon as a separate line. The exact same is true for UTR.
If you want the merged length of the UTR and CDS, it is bets to pull that info out of the _QI= part of the GFF3 attributes for each mRNA.
What about single bp exons? Those cannot occur unless you gave an input GFF3 with predictions that have single bp exons. The predictors like SNAP and Augustus just won’t produce them, with one exception. They can potentially produce them for the first/last exon. This is not because the exon is 1 bp, but rather because the predictor only reports the CDS part of the exon. As a result if the stop/start codon may have only 1 bp overlapping that exon, but one you add UTR the exon will extend from that point and will no longer be 1bp in length. But if the UTR never gets added, then you can be left with a partial initial/terminal exon.
However more than likely what you are seeing is just related to how SOBA reports individual feature line stats as opposed to merged stats for CDS and UTR.
Thanks,
Carson
> On Feb 18, 2017, at 9:43 AM, Qihua Liang <qlian003 at ucr.edu> wrote:
>
> Dear Maker develop team,
>
> I used SOBA website to calculate the statistics of Maker annotation, and I found out the length of some features of Maker, like CDS, exon, 5’ and 3’UTR, the minimal length of such features can be as short as 1bp. These are confusing, with such features length of 1bp. When Maker combines different gene models and makes such predictions, how will it accept such abnormal exon/CDS length? And is there any parameters in the bopt.ctl or evm.ctl to avoid such abnormal gene models?
>
> Thanks
> Qihua
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list