[maker-devel] Unknown (X) amino acids in predicted proteins

Wed Feb 13 10:18:44 MST 2019

If you use GFF3 as input, or use est2genome or protein2genome in your final run, you may have ’N’ characters from the assembly as part of your CDS (’N’ is the ambiguity code for DNA which will result in an ‘X’ when translated which is the ambiguity code for amino acids). Augustus will do internal gymnastics and completely splice out exons containing N’s to try and never have this issue, but may not always be able to. It’s an indication of genome assembly issues.

--Carson

> On Feb 11, 2019, at 7:12 AM, Lior Glick <liorglic at mail.tau.ac.il> wrote:
> 
> Dear MAKER users,
> 
> After completing a MAKER run, I looked at the protein fasta files that MAKER outputs and noticed that a small fraction of the sequences include X characters, indicating unknown amino acids. I was wondering how such sequences are obtained, I mean how come there are unknown amino acids in the prediction? Is this an indication of low-quality predictions?
> Is there any documentation regarding the procedure that generates the protein sequences?
> 
> Thanks a lot,
> Lior
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org