[maker-devel] gff pass thru problem and unsupported EST nucleotides

Carson Holt carsonhh at gmail.com
Mon Feb 24 13:59:12 MST 2014


I found the issue.  You have non-ascii characters at the end of almost
every line.  Because they are happening within the Parent= tag, they then
become part of the Parent ID when the file is read.

So instead of "HERA000031-RA” you get —> "HERA000031-RA\cM” as the Parent
ID.

“\cM” is a meta-return.

I ran the attached script to remove these characters (perl purify
<gff3_file>), and then it works.  Make sure to remove the
.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
to force the GFF3 database to be rebuilt after fixing the file when you
rerun MAKER.

Thanks,
Carson




On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:

>Hi Carson and Daniel,
>
>Thanks for your suggestions.  I have looked at the gff file, but I do not
>see any obvious errors.  I have uploaded the files to your website.  The
>reference fasta is there, the full gff, and a single gene gff that also
>causes an error.  If I remove that gene from the full gff, then the error
>is on the next gene in the file, so it appears to be a systematic problem
>throughout the gff.  The gff was generated by Maker, but I may have
>messed it up when I modified it to rename genes and add functional
>information.  I checked with cat -te, but don't see any obvious
>formatting errors.
>
>Thanks!
>Megan
>
>
>--------------------------------------------
>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>
> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>nucleotides
> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
> Date: Monday, February 24, 2014, 10:18 AM
> 
> The -fix_nucleotides flag is added to
> the command line (I.e. maker
> -fix_nucleotides flag).  It is there so you are aware
> that there is an
> issue with your fasta file, that will cause things
> downstream to fail.
> MAKER can fix the errors for you, but first it gives a
> warning designed to
> make you look at the file and validate it.  Why would
> you want to do this?
>  For example, what if you provided protein sequence to the
> EST option
> accidentally, you wouldn’t want MAKER to just
> proceed.  You want a warning
> so you can check first.  If your file is in fact EST
> data, then set the
> flag and those characters will be changed to N’s in the
> fixed fasta
> sequence, otherwise those characters will cause errors in
> downstream tools
> like exonerate, and even some downstream GMOD tools, so they
> can’t be
> allowed to remain as is.
> 
> For the GFF3 file, there is almost definitely a logic issue
> in the file
> (mod encode validator won’t check for those).  This
> can be from prior
> manipulation of the GFF3 file.  For example, IDs for a
> gene that are the
> same across two contigs (technically valid but a logic
> error).  The GFF3
> error message will normally give the ID of the feature
> causing the issue.
> 
> I could also take a look for you.  You can upload the
> GFF3 file here —>
> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
> Click on 'new guest account' then e-mail me back you guest
> ID, so I know
> which files to review.
> 
> Thanks,
> Carson
> 
> 
> 
> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
> wrote:
> 
> >Maker folks,
> >I am re-annotating a single contig and I am having a few
> problems.
> >
> >First, I am having trouble passing through a Maker
> derived gff (from
> >Maker 2.09, with some modifications to gene names and
> functional
> >information added).  The gff file passes the
> modencode validator but
> >Maker always fails on the first gene in the file,
> regardless of which
> >gene comes first.  So it appears to be a systematic
> error across the
> >entire file.  The Maker error is "Check your input
> GFF3 file for errors!
> >(from GFFDB)".   I have tried Maker 2.10
> and 2.31, using both genome_gff
> >with model_pass=1 and pred_gff.  Attached is a gff
> with the first 2
> >genes.  
> >
> >Second, when I updated to Maker 2.31, Maker now
> complains that my EST
> >fasta file has nucleotides that are not supported
> [RYKMSWBDHV].  It
> >suggests "set -fix_nucleotides on the command line to
> fix this
> >automatically".  Is the -fix_nucleotides a Maker
> flag?  What exactly does
> >it do?  Does it remove the entire sequence or
> replace ambiguous bases
> >with a randomly selected one?  Half of my 20k ESTs
> contain these
> >characters, so I don't want to throw them out entirely.
> >
> >Also, just curious, has Maker never supported these
> characters but just
> >never complained?  I used this EST data set with
> Maker 2.09.  I did note
> >poor EST coverage, but thought it was an issue with the
> EST data itself.
> >
> >I appreciate any suggestions.
> >Thanks,
> >Megan_______________________________________________
> >maker-devel mailing list
> >maker-devel at box290.bluehost.com
> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: purify
Type: application/octet-stream
Size: 1965 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140224/a1582e7d/attachment-0001.obj>


More information about the maker-devel mailing list