[maker-devel] gff pass thru problem and unsupported EST nucleotides

Megan hedgyx at yahoo.com
Tue Feb 25 17:26:11 MST 2014


Carson,

Everything ran through smoothly after removing the ^Ms.  Thanks for the help.

Megan
--------------------------------------------
On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:

 Subject: Re: [maker-devel] gff pass thru problem and unsupported EST nucleotides
 To: "Megan" <hedgyx at yahoo.com>, "Daniel Ence" <dence at genetics.utah.edu>
 Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
 Date: Monday, February 24, 2014, 12:59 PM
 
 I found the issue.  You have
 non-ascii characters at the end of almost
 every line.  Because they are happening within the
 Parent= tag, they then
 become part of the Parent ID when the file is read.
 
 So instead of "HERA000031-RA” you get —>
 "HERA000031-RA\cM” as the Parent
 ID.
 
 “\cM” is a meta-return.
 
 I ran the attached script to remove these characters (perl
 purify
 <gff3_file>), and then it works.  Make sure to
 remove the
 .../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db
 file
 to force the GFF3 database to be rebuilt after fixing the
 file when you
 rerun MAKER.
 
 Thanks,
 Carson
 
 
 
 
 On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com>
 wrote:
 
 >Hi Carson and Daniel,
 >
 >Thanks for your suggestions.  I have looked at the
 gff file, but I do not
 >see any obvious errors.  I have uploaded the files
 to your website.  The
 >reference fasta is there, the full gff, and a single
 gene gff that also
 >causes an error.  If I remove that gene from the
 full gff, then the error
 >is on the next gene in the file, so it appears to be a
 systematic problem
 >throughout the gff.  The gff was generated by
 Maker, but I may have
 >messed it up when I modified it to rename genes and add
 functional
 >information.  I checked with cat -te, but don't see
 any obvious
 >formatting errors.
 >
 >Thanks!
 >Megan
 >
 >
 >--------------------------------------------
 >On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com>
 wrote:
 >
 > Subject: Re: [maker-devel] gff pass thru problem and
 unsupported EST
 >nucleotides
 > To: "Megan" <hedgyx at yahoo.com>,
 maker-devel at yandell-lab.org
 > Date: Monday, February 24, 2014, 10:18 AM
 > 
 > The -fix_nucleotides flag is added to
 > the command line (I.e. maker
 > -fix_nucleotides flag).  It is there so you are
 aware
 > that there is an
 > issue with your fasta file, that will cause things
 > downstream to fail.
 > MAKER can fix the errors for you, but first it gives a
 > warning designed to
 > make you look at the file and validate it.  Why
 would
 > you want to do this?
 >  For example, what if you provided protein
 sequence to the
 > EST option
 > accidentally, you wouldn’t want MAKER to just
 > proceed.  You want a warning
 > so you can check first.  If your file is in fact
 EST
 > data, then set the
 > flag and those characters will be changed to N’s in
 the
 > fixed fasta
 > sequence, otherwise those characters will cause errors
 in
 > downstream tools
 > like exonerate, and even some downstream GMOD tools, so
 they
 > can’t be
 > allowed to remain as is.
 > 
 > For the GFF3 file, there is almost definitely a logic
 issue
 > in the file
 > (mod encode validator won’t check for those). 
 This
 > can be from prior
 > manipulation of the GFF3 file.  For example, IDs
 for a
 > gene that are the
 > same across two contigs (technically valid but a logic
 > error).  The GFF3
 > error message will normally give the ID of the feature
 > causing the issue.
 > 
 > I could also take a look for you.  You can upload
 the
 > GFF3 file here —>
 > http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
 > Click on 'new guest account' then e-mail me back you
 guest
 > ID, so I know
 > which files to review.
 > 
 > Thanks,
 > Carson
 > 
 > 
 > 
 > On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
 > wrote:
 > 
 > >Maker folks,
 > >I am re-annotating a single contig and I am having
 a few
 > problems.
 > >
 > >First, I am having trouble passing through a Maker
 > derived gff (from
 > >Maker 2.09, with some modifications to gene names
 and
 > functional
 > >information added).  The gff file passes the
 > modencode validator but
 > >Maker always fails on the first gene in the file,
 > regardless of which
 > >gene comes first.  So it appears to be a
 systematic
 > error across the
 > >entire file.  The Maker error is "Check your
 input
 > GFF3 file for errors!
 > >(from GFFDB)".   I have tried Maker
 2.10
 > and 2.31, using both genome_gff
 > >with model_pass=1 and pred_gff.  Attached is a
 gff
 > with the first 2
 > >genes.  
 > >
 > >Second, when I updated to Maker 2.31, Maker now
 > complains that my EST
 > >fasta file has nucleotides that are not supported
 > [RYKMSWBDHV].  It
 > >suggests "set -fix_nucleotides on the command line
 to
 > fix this
 > >automatically".  Is the -fix_nucleotides a
 Maker
 > flag?  What exactly does
 > >it do?  Does it remove the entire sequence or
 > replace ambiguous bases
 > >with a randomly selected one?  Half of my 20k
 ESTs
 > contain these
 > >characters, so I don't want to throw them out
 entirely.
 > >
 > >Also, just curious, has Maker never supported
 these
 > characters but just
 > >never complained?  I used this EST data set
 with
 > Maker 2.09.  I did note
 > >poor EST coverage, but thought it was an issue with
 the
 > EST data itself.
 > >
 > >I appreciate any suggestions.
 > >Thanks,
 >
 >Megan_______________________________________________
 > >maker-devel mailing list
 > >maker-devel at box290.bluehost.com
 > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
 > 
 > 
 >
 
 




More information about the maker-devel mailing list