[maker-devel] Patch for a bug with repeat gff

Carson Holt carsonhh at gmail.com
Sun Jun 16 13:46:51 MDT 2013


Thanks for the detailed report and test files.

The problem initiates with your GFF3 giving a repeat structure that is a
spliced repeat.  I don't know if such a thing can really occur, but
regardless maker doesn't expect them to occur, and as a result when
assembled some of the spliced exons run off the edge of the sequence.  The
script currently checks for repeats where the end of a repeat runs off the
edge and adjusts accordingly, but does not check for a start that runs off
the edge (because it's not expecting spliced repeats).  The result is the
substring outside of string error.

I added 'next if($l <=0)' to both the _soft_mask_seq and _hard_mask_seq
functions, and hopefully having spliced repeats won't cause other hidden
errors elsewhere downstream, but you may need to be aware of the
possibility.

Thanks,
Carson



On 13-06-12 9:29 AM, "Anthony Bretaudeau"
<anthony.bretaudeau at rennes.inra.fr> wrote:

>Hi,
>Here is a minimal gff file that allows to reproduce the bug. It should
>work with any fasta (my real data is not yet published, I can't share it
>publicly yet).
>Tell me if you need more info
>Anthony
>
>On 11/06/2013 17:06, Carson Holt wrote:
>> Could you send me your repeat_gff and genome fasta, so I can take a
>>look.
>>
>> Thanks,
>> Carson
>>
>>
>>
>> On 13-06-11 11:03 AM, "Anthony Bretaudeau"
>> <anthony.bretaudeau at rennes.inra.fr> wrote:
>>
>>> Hello,
>>> I have just tested with 2.28b: the problem is still there, and my fix
>>> works on this version too.
>>> Cheers
>>> Anthony
>>>
>>> On 10/06/2013 18:13, Carson Holt wrote:
>>>> Could you use MAKER version 2.28 instead (launch with maker -a if it
>>>> still
>>>> fails).
>>>>
>>>> Thanks,
>>>> Carson
>>>>
>>>>
>>>>
>>>> On 13-06-10 11:48 AM, "Anthony Bretaudeau"
>>>> <anthony.bretaudeau at rennes.inra.fr> wrote:
>>>>
>>>>> Hello,
>>>>> I am running Maker 2.27b on an insect genome, and I use a gff file
>>>>> containing some repeat positions (rm_gff option in maker_opts.ctl).
>>>>>
>>>>> I encountered an error on 10 scaffolds (the genome contains ~40000
>>>>> scaffolds) : "substr outside of string" (similar to this post:
>>>>>
>>>>> 
>>>>>http://gmod.827538.n3.nabble.com/substr-outside-of-string-td4031889.ht
>>>>>ml
>>>>> ).
>>>>>
>>>>> After a lot a debugging, it turns out the problem came from the code
>>>>>of
>>>>> "phathits_on_chunk" function in lib/GFFDB.pm, near line 539: there
>>>>>is a
>>>>> SQL query that fetches features that overlap with the border of the
>>>>> sequence chunk.
>>>>> The problem is that it also fetches features that are completely
>>>>> outside
>>>>> of the chunk in the same region. This produces an error when maker
>>>>> tries
>>>>> to mask the sequence as it does a substr outside the string.
>>>>>
>>>>> I fixed it by patching lib/repeat_mask_seq.pm, near line 138:
>>>>> I replaced:
>>>>>             substr($$seq, $b -1 , $l, "$replace"x$l);
>>>>> By:
>>>>>         if ($b < length($$seq)) {
>>>>>             substr($$seq, $b -1 , $l, "$replace"x$l);
>>>>>         }
>>>>>
>>>>> I don't know if there is a more elegant solution, but this seems to
>>>>> solve the problem.
>>>>>
>>>>> Cheers
>>>>> Anthony
>>>>>
>>>>> _______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> 
>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or
>>>>>g
>>
>






More information about the maker-devel mailing list