[maker-devel] substr outside of string

Carson Holt carsonhh at gmail.com
Wed May 1 07:17:50 MDT 2013


The length you are printing is not the length of the contig, but rather
the length of the piece of the contig MAKER is working with at that
moment.  The fact that the length is not exactly 100000 is telling me that
this is a piece at the end of the contig.  By any chance are you using
GFF3 pass-through of repeat elements? If not there may be a repeatmasker
parsing bug as the start and end coordinate are off the edge of the
contig.  If you run maker on the command line (not vie MPI), what is the
repeatmasker report read immediately before the error.  Could you then
attach it and the fasta sequence for the contig that fails.

Thanks,
Carson


On 13-05-01 7:38 AM, "Michael Nuhn" <mnuhn at ebi.ac.uk> wrote:

>Hello!
>
>I have run maker with est and rna seq data to create a training set for
>SNAP. Then I trained SNAP and added the hmm to the snaphmm option and
>reran maker.
>
>Maker is giving me error messages like this:
>
>"
>setting up GFF3 output and fasta chunks
>doing repeat masking
>re reading repeat masker report.
><Name of file>
>substr outside of string at <Path to
>maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140
>.
>--> rank=NA, hostname=ebi-209.ebi.ac.uk
>"
>
>The line from which this error message originates is:
>
>	substr($$seq, $b -1 , $l, "$replace"x$l);
>
>After getting these error messages I replaced it with
>
>       eval {
>	substr($$seq, $b -1 , $l, "$replace"x$l);
>       };
>       if ($@) {
>	use Carp;
>	use Data::Dumper;
>	confess(
>	    $@
>	    . "\n\n"
>	    . Dumper($p)
>	    . "\n\n"
>	    . "Length of sequence: " . (length $$seq)
>	);
>       }
>
>After that I got this:
>
>$VAR1 = [
>           98926,
>           99033
>         ];
>
>
>Length of sequence: 98686 at <Path to
>maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14
>5
>
>I have not changed the genome file.
>
>I'm also concerned with the reported length of 98686, because I have a
>list of all sequences in the file and their lengths, and none of them
>has a length of 98686 bp. The sequences with the closest lengths are
>these:
>
>98367 LSalAtl2s1200
>98438 LSalAtl2s1473
>98776 LSalAtl2s1613
>98876 LSalAtl2s1199
>
>so they are not even close.
>
>$$seq is a sequence as a string, when I print it.
>
>Sometimes maker prints a message like this:
>
>"
>--Next Contig--
>
>Processing run.log file...
>#---------------------------------------------------------------------
>Now retrying the contig!!
>SeqID: LSalAtl2s63
>Length: 3997709
>Tries: 5!!
>#---------------------------------------------------------------------
>"
>
>But according to my list, which I generated from the exact same file
>that maker has in genome_file option, the length of that sequence is
>1169407.
>
>Any idea, why I am getting these problems and what to do about them?
>
>Cheers,
>Michael.
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list