[maker-devel] gff pass thru problem and unsupported EST nucleotides
Megan
hedgyx at yahoo.com
Tue Feb 25 17:26:11 MST 2014
Carson,
Everything ran through smoothly after removing the ^Ms. Thanks for the help.
Megan
--------------------------------------------
On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
To: "Megan" <hedgyx at yahoo.com>, "Daniel Ence" <dence at genetics.utah.edu>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Date: Monday, February 24, 2014, 12:59 PM
I found the issue. You have
non-ascii characters at the end of almost
every line. Because they are happening within the
Parent= tag, they then
become part of the Parent ID when the file is read.
So instead of "HERA000031-RA” you get —>
"HERA000031-RA\cM” as the Parent
ID.
“\cM” is a meta-return.
I ran the attached script to remove these characters (perl
purify
<gff3_file>), and then it works. Make sure to
remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db
file
to force the GFF3 database to be rebuilt after fixing the
file when you
rerun MAKER.
Thanks,
Carson
On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com>
wrote:
>Hi Carson and Daniel,
>
>Thanks for your suggestions. I have looked at the
gff file, but I do not
>see any obvious errors. I have uploaded the files
to your website. The
>reference fasta is there, the full gff, and a single
gene gff that also
>causes an error. If I remove that gene from the
full gff, then the error
>is on the next gene in the file, so it appears to be a
systematic problem
>throughout the gff. The gff was generated by
Maker, but I may have
>messed it up when I modified it to rename genes and add
functional
>information. I checked with cat -te, but don't see
any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com>
wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and
unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>,
maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
>
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag). It is there so you are
aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it. Why
would
> you want to do this?
> For example, what if you provided protein
sequence to the
> EST option
> accidentally, you wouldn’t want MAKER to just
> proceed. You want a warning
> so you can check first. If your file is in fact
EST
> data, then set the
> flag and those characters will be changed to N’s in
the
> fixed fasta
> sequence, otherwise those characters will cause errors
in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so
they
> can’t be
> allowed to remain as is.
>
> For the GFF3 file, there is almost definitely a logic
issue
> in the file
> (mod encode validator won’t check for those).
This
> can be from prior
> manipulation of the GFF3 file. For example, IDs
for a
> gene that are the
> same across two contigs (technically valid but a logic
> error). The GFF3
> error message will normally give the ID of the feature
> causing the issue.
>
> I could also take a look for you. You can upload
the
> GFF3 file here —>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you
guest
> ID, so I know
> which files to review.
>
> Thanks,
> Carson
>
>
>
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
>
> >Maker folks,
> >I am re-annotating a single contig and I am having
a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names
and
> functional
> >information added). The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first. So it appears to be a
systematic
> error across the
> >entire file. The Maker error is "Check your
input
> GFF3 file for errors!
> >(from GFFDB)". I have tried Maker
2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff. Attached is a
gff
> with the first 2
> >genes.
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV]. It
> >suggests "set -fix_nucleotides on the command line
to
> fix this
> >automatically". Is the -fix_nucleotides a
Maker
> flag? What exactly does
> >it do? Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one? Half of my 20k
ESTs
> contain these
> >characters, so I don't want to throw them out
entirely.
> >
> >Also, just curious, has Maker never supported
these
> characters but just
> >never complained? I used this EST data set
with
> Maker 2.09. I did note
> >poor EST coverage, but thought it was an issue with
the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
>
>Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
More information about the maker-devel
mailing list