[maker-devel] gff pass thru problem and unsupported EST nucleotides
Carson Holt
carsonhh at gmail.com
Mon Feb 24 13:59:12 MST 2014
I found the issue. You have non-ascii characters at the end of almost
every line. Because they are happening within the Parent= tag, they then
become part of the Parent ID when the file is read.
So instead of "HERA000031-RA” you get —> "HERA000031-RA\cM” as the Parent
ID.
“\cM” is a meta-return.
I ran the attached script to remove these characters (perl purify
<gff3_file>), and then it works. Make sure to remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
to force the GFF3 database to be rebuilt after fixing the file when you
rerun MAKER.
Thanks,
Carson
On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:
>Hi Carson and Daniel,
>
>Thanks for your suggestions. I have looked at the gff file, but I do not
>see any obvious errors. I have uploaded the files to your website. The
>reference fasta is there, the full gff, and a single gene gff that also
>causes an error. If I remove that gene from the full gff, then the error
>is on the next gene in the file, so it appears to be a systematic problem
>throughout the gff. The gff was generated by Maker, but I may have
>messed it up when I modified it to rename genes and add functional
>information. I checked with cat -te, but don't see any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
>
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag). It is there so you are aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it. Why would
> you want to do this?
> For example, what if you provided protein sequence to the
> EST option
> accidentally, you wouldn’t want MAKER to just
> proceed. You want a warning
> so you can check first. If your file is in fact EST
> data, then set the
> flag and those characters will be changed to N’s in the
> fixed fasta
> sequence, otherwise those characters will cause errors in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so they
> can’t be
> allowed to remain as is.
>
> For the GFF3 file, there is almost definitely a logic issue
> in the file
> (mod encode validator won’t check for those). This
> can be from prior
> manipulation of the GFF3 file. For example, IDs for a
> gene that are the
> same across two contigs (technically valid but a logic
> error). The GFF3
> error message will normally give the ID of the feature
> causing the issue.
>
> I could also take a look for you. You can upload the
> GFF3 file here —>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you guest
> ID, so I know
> which files to review.
>
> Thanks,
> Carson
>
>
>
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
>
> >Maker folks,
> >I am re-annotating a single contig and I am having a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names and
> functional
> >information added). The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first. So it appears to be a systematic
> error across the
> >entire file. The Maker error is "Check your input
> GFF3 file for errors!
> >(from GFFDB)". I have tried Maker 2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff. Attached is a gff
> with the first 2
> >genes.
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV]. It
> >suggests "set -fix_nucleotides on the command line to
> fix this
> >automatically". Is the -fix_nucleotides a Maker
> flag? What exactly does
> >it do? Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one? Half of my 20k ESTs
> contain these
> >characters, so I don't want to throw them out entirely.
> >
> >Also, just curious, has Maker never supported these
> characters but just
> >never complained? I used this EST data set with
> Maker 2.09. I did note
> >poor EST coverage, but thought it was an issue with the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
> >Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: purify
Type: application/octet-stream
Size: 1966 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140224/a1582e7d/attachment-0003.obj>
More information about the maker-devel
mailing list