[maker-devel] gff pass thru problem and unsupported EST nucleotides

Carson Holt carsonhh at gmail.com
Mon Feb 24 14:03:00 MST 2014


One more thing.  You must give the file to pred_gff or model_gff.  It is
no longer strictly a MAKER file, as many of the source columns read ‘.’
meaning it has been edited by Apollo or another editor.  So it will not be
guaranteed to be recognized by genome_gff, because many of the source tags
have changed.

Thanks,
Carson


On 2/24/14, 1:59 PM, "Carson Holt" <carsonhh at gmail.com> wrote:

>I found the issue.  You have non-ascii characters at the end of almost
>every line.  Because they are happening within the Parent= tag, they then
>become part of the Parent ID when the file is read.
>
>So instead of "HERA000031-RA” you get —> "HERA000031-RA\cM” as the Parent
>ID.
>
>“\cM” is a meta-return.
>
>I ran the attached script to remove these characters (perl purify
><gff3_file>), and then it works.  Make sure to remove the
>.../Hera_Cr_HmelHybd_Nov2013.maker.output/Hera_Cr_HmelHybd_Nov2013.db file
>to force the GFF3 database to be rebuilt after fixing the file when you
>rerun MAKER.
>
>Thanks,
>Carson
>
>
>
>
>On 2/24/14, 1:32 PM, "Megan" <hedgyx at yahoo.com> wrote:
>
>>Hi Carson and Daniel,
>>
>>Thanks for your suggestions.  I have looked at the gff file, but I do not
>>see any obvious errors.  I have uploaded the files to your website.  The
>>reference fasta is there, the full gff, and a single gene gff that also
>>causes an error.  If I remove that gene from the full gff, then the error
>>is on the next gene in the file, so it appears to be a systematic problem
>>throughout the gff.  The gff was generated by Maker, but I may have
>>messed it up when I modified it to rename genes and add functional
>>information.  I checked with cat -te, but don't see any obvious
>>formatting errors.
>>
>>Thanks!
>>Megan
>>
>>
>>--------------------------------------------
>>On Mon, 2/24/14, Carson Holt <carsonhh at gmail.com> wrote:
>>
>> Subject: Re: [maker-devel] gff pass thru problem and unsupported EST
>>nucleotides
>> To: "Megan" <hedgyx at yahoo.com>, maker-devel at yandell-lab.org
>> Date: Monday, February 24, 2014, 10:18 AM
>> 
>> The -fix_nucleotides flag is added to
>> the command line (I.e. maker
>> -fix_nucleotides flag).  It is there so you are aware
>> that there is an
>> issue with your fasta file, that will cause things
>> downstream to fail.
>> MAKER can fix the errors for you, but first it gives a
>> warning designed to
>> make you look at the file and validate it.  Why would
>> you want to do this?
>>  For example, what if you provided protein sequence to the
>> EST option
>> accidentally, you wouldn’t want MAKER to just
>> proceed.  You want a warning
>> so you can check first.  If your file is in fact EST
>> data, then set the
>> flag and those characters will be changed to N’s in the
>> fixed fasta
>> sequence, otherwise those characters will cause errors in
>> downstream tools
>> like exonerate, and even some downstream GMOD tools, so they
>> can’t be
>> allowed to remain as is.
>> 
>> For the GFF3 file, there is almost definitely a logic issue
>> in the file
>> (mod encode validator won’t check for those).  This
>> can be from prior
>> manipulation of the GFF3 file.  For example, IDs for a
>> gene that are the
>> same across two contigs (technically valid but a logic
>> error).  The GFF3
>> error message will normally give the ID of the feature
>> causing the issue.
>> 
>> I could also take a look for you.  You can upload the
>> GFF3 file here —>
>> http://weatherby.genetics.utah.edu/cgi-bin/mwas/bug.cgi
>> Click on 'new guest account' then e-mail me back you guest
>> ID, so I know
>> which files to review.
>> 
>> Thanks,
>> Carson
>> 
>> 
>> 
>> On 2/24/14, 12:02 AM, "Megan" <hedgyx at yahoo.com>
>> wrote:
>> 
>> >Maker folks,
>> >I am re-annotating a single contig and I am having a few
>> problems.
>> >
>> >First, I am having trouble passing through a Maker
>> derived gff (from
>> >Maker 2.09, with some modifications to gene names and
>> functional
>> >information added).  The gff file passes the
>> modencode validator but
>> >Maker always fails on the first gene in the file,
>> regardless of which
>> >gene comes first.  So it appears to be a systematic
>> error across the
>> >entire file.  The Maker error is "Check your input
>> GFF3 file for errors!
>> >(from GFFDB)".   I have tried Maker 2.10
>> and 2.31, using both genome_gff
>> >with model_pass=1 and pred_gff.  Attached is a gff
>> with the first 2
>> >genes.  
>> >
>> >Second, when I updated to Maker 2.31, Maker now
>> complains that my EST
>> >fasta file has nucleotides that are not supported
>> [RYKMSWBDHV].  It
>> >suggests "set -fix_nucleotides on the command line to
>> fix this
>> >automatically".  Is the -fix_nucleotides a Maker
>> flag?  What exactly does
>> >it do?  Does it remove the entire sequence or
>> replace ambiguous bases
>> >with a randomly selected one?  Half of my 20k ESTs
>> contain these
>> >characters, so I don't want to throw them out entirely.
>> >
>> >Also, just curious, has Maker never supported these
>> characters but just
>> >never complained?  I used this EST data set with
>> Maker 2.09.  I did note
>> >poor EST coverage, but thought it was an issue with the
>> EST data itself.
>> >
>> >I appreciate any suggestions.
>> >Thanks,
>> >Megan_______________________________________________
>> >maker-devel mailing list
>> >maker-devel at box290.bluehost.com
>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> 
>>
>






More information about the maker-devel mailing list