[maker-devel] gff pass thru problem and unsupported EST nucleotides
Daniel Ence
dence at genetics.utah.edu
Mon Feb 24 11:31:47 MST 2014
Hi Megan,
One problem with the GFF3 that you attached is that the ID's for the CDS features are being made wrong. All of the CDS features for a given mRNA or transcript should have the same ID. The CDS features in your GFF3 have IDs that use the exon name.
You can fix it with this command-line perl:
cat part_passthru.gff | perl -ane 'if(/\tCDS\t/){ chomp; /Parent=([\S]+)/; my $parent=$1; s/ID=([^\;]+)/ID=$parent-cds/; print "$_\n"}else{print $_}' > fixed.gff3
It just fixes the ID attributes in all of the CDS features. Try it on the test gff3 you sent and let me know if it works. I can't test it myself without the fasta file that you are annotating.
Thanks,
Daniel
Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com]
Sent: Monday, February 24, 2014 11:18 AM
To: Megan; maker-devel at yandell-lab.org
Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
The -fix_nucleotides flag is added to the command line (I.e. maker
-fix_nucleotides flag). It is there so you are aware that there is an
issue with your fasta file, that will cause things downstream to fail.
MAKER can fix the errors for you, but first it gives a warning designed to
make you look at the file and validate it. Why would you want to do this?
For example, what if you provided protein sequence to the EST option
accidentally, you wouldn’t want MAKER to just proceed. You want a warning
so you can check first. If your file is in fact EST data, then set the
flag and those characters will be changed to N’s in the fixed fasta
sequence, otherwise those characters will cause errors in downstream tools
like exonerate, and even some downstream GMOD tools, so they can’t be
allowed to remain as is.
For the GFF3 file, there is almost definitely a logic issue in the file
(mod encode validator won’t check for those). This can be from prior
manipulation of the GFF3 file. For example, IDs for a gene that are the
same across two contigs (technically valid but a logic error). The GFF3
error message will normally give the ID of the feature causing the issue.
I could also take a look for you. You can upload the GFF3 file here —>
http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
Click on 'new guest account' then e-mail me back you guest ID, so I know
which files to review.
Thanks,
Carson
On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com> wrote:
>Maker folks,
>I am re-annotating a single contig and I am having a few problems.
>
>First, I am having trouble passing through a Maker derived gff (from
>Maker 2.09, with some modifications to gene names and functional
>information added). The gff file passes the modencode validator but
>Maker always fails on the first gene in the file, regardless of which
>gene comes first. So it appears to be a systematic error across the
>entire file. The Maker error is "Check your input GFF3 file for errors!
>(from GFFDB)". I have tried Maker 2.10 and 2.31, using both genome_gff
>with model_pass=1 and pred_gff. Attached is a gff with the first 2
>genes.
>
>Second, when I updated to Maker 2.31, Maker now complains that my EST
>fasta file has nucleotides that are not supported [RYKMSWBDHV]. It
>suggests "set -fix_nucleotides on the command line to fix this
>automatically". Is the -fix_nucleotides a Maker flag? What exactly does
>it do? Does it remove the entire sequence or replace ambiguous bases
>with a randomly selected one? Half of my 20k ESTs contain these
>characters, so I don't want to throw them out entirely.
>
>Also, just curious, has Maker never supported these characters but just
>never complained? I used this EST data set with Maker 2.09. I did note
>poor EST coverage, but thought it was an issue with the EST data itself.
>
>I appreciate any suggestions.
>Thanks,
>Megan_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list