[maker-devel] gff pass thru problem and unsupported EST nucleotides
Carson Holt
carsonhh at gmail.com
Mon Feb 24 11:18:18 MST 2014
The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag). It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it. Why would you want to do this?
For example, what if you provided protein sequence to the EST option
accidentally, you wouldn’t want MAKER to just proceed. You want a warning
so you can check first. If your file is in fact EST data, then set the
flag and those characters will be changed to N’s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can’t be
allowed to remain as is.
For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won’t check for those). This can be from prior
manipulation of the GFF3 file. For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error). The GFF3
error message will normally give the ID of the feature causing the issue.
I could also take a look for you. You can upload the GFF3 file here —>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.
Thanks,
Carson
On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:
>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added). The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first. So it appears to be a systematic error across the
>entire file. The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff. Attached is a gff with the first 2
>genes.
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV]. It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically". Is the -fix_nucleotides a Maker flag? What exactly does
>it do? Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one? Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained? I used this EST data set with Maker 2.09. I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list