[maker-devel] Format of repeat_gff gff3 file
Michael Thon
mike.thon at gmail.com
Wed Oct 17 04:48:59 MDT 2012
Hi - 2 ideas:
1)
There is a utility included with RepeatMasker that enables you to extract sequences from it. its in the utils directory. You can run it like this:
./queryRepeatDatabase.pl -species Fungi -clade
that will extract all repeat sequences belonging to Fungi or its descendants. Maybe the simplest thing for you to do is extract the sequences that you want and cat them together with your in house sequences to provide MAKER with a single file of reference sequences.
2)
some of the MAKER parameters take a comma separated list of files. I don't know if this applied to rm_lib though.
Mike
On Oct 16, 2012, at 6:58 PM, Mikael Brandström Durling <mikael.durling at slu.se> wrote:
> Hi,
>
> I would like to mask my fungal genome from two different sources (ie. repbase and an inhouse repeat library). However, I suppose the that if I supply a library as rmlib in maker_opts, it will be mutually exclusive to the model_org option, in the same way as -spec and -lib options to RepeatMasker (I hope I am wrong here...)). To circumvent this, I give the model_org option as fungi, and would like to provide maker with additional masking as a gff file. I tried by running RepeatMasker with my inhouse library, and then used rmOutToGFF3.pl from the RepeatMasker package to obtain a gff3 file. This file was supplied to maker as rm_gff (see below for a sample from the file). The run fail with backtraces like the one paseted below. How should this gff file be formatted for maker to understand it? I see that in maker produced gff files, there are additional information found in the id of the hits. Is this required?
>
> Maybe it's easier to modify maker to make two rounds of RepeatMasker calls if both model_org and rmlib are specified?
>
> Thanks for any input,
> Mikael
>
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Must have defined a valid name for Hit
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /opt/sw/bioperl/2.1.8/lib/site_perl/5.16.1/Bio/Root/Root.pm:472
> STACK: Bio::Search::Hit::GenericHit::new /opt/sw/bioperl/2.1.8/lib/site_perl/5.16.1/Bio/Search/Hit/GenericHit.pm:149
> STACK: Bio::Search::Hit::PhatHit::Base::new /net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib/Bio/Search/Hit/PhatHit/Base.pm:127
> STACK: Bio::Search::Hit::PhatHit::gff3::new /net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib/Bio/Search/Hit/PhatHit/gff3.pm:23
> STACK: GFFDB::_load_hits /net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib/GFFDB.pm:1026
> STACK: GFFDB::phathits_on_chunk /net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib/GFFDB.pm:651
> STACK: Process::MpiChunk::_go /net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib/Process/MpiChunk.pm:752
> STACK: Process::MpiChunk::run /net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib/Process/MpiChunk.pm:331
> STACK: main::node_thread /proj/mykopat-gbrowse/software/maker/2.26/bin//maker:1307
> STACK: threads::new /proj/mykopat-gbrowse/software/maker/2.26/perl_modules/lib/site_perl/5.16.1/x86_64-linux/forks.pm:799
> STACK: /proj/mykopat-gbrowse/software/maker/2.26/bin//maker:803
> -----------------------------------------------------------
> Cannot restore overloading on HASH(0x2580810) (package Bio::Root::Exception) (even after a "require Bio::Root::Exception;") at /opt/sw/bioperl/2.1.8/lib/5.16.1/x86_64-linux/Storable.pm line 417, at /proj/mykopat-gbrowse/software/maker/2.26/perl_modules/lib/site_perl/5.16.1/x86_64-linux/forks.pm line 2256.
> Compilation failed in require at /proj/mykopat-gbrowse/software/maker/2.26/bin//maker line 11.
> BEGIN failed--compilation aborted at /proj/mykopat-gbrowse/software/maker/2.26/bin//maker line 11.
> Perl exited with active threads:
> 1 running and unjoined
> 0 finished and unjoined
> 0 running and detached
> ERROR: Could not open '/net/gridnas4/volume4/proj1/mykopat-gbrowse/genomes/CrosV1/CrosV1.maker.output/CrosV1_datastore/05/2B/scf_55419//theVoid.scf_55419/scf_55419.0.fungi.rb.out'
> ERROR: Failed while doing repeat masking
> ERROR: Chunk failed at level:0, tier_type:1
> FAILED CONTIG:scf_55419
>
>
> The gff3-file looks like this:
> ##gff-version 3
> ##sequence-region scf_42697 1 3949
> scf_42697 RepeatMasker dispersed_repeat 186 256 22 + . Target=AT_rich 1 71
> scf_42697 RepeatMasker dispersed_repeat 351 378 28 + . Target=AT_rich 1 28
> scf_42697 RepeatMasker dispersed_repeat 560 602 22 + . Target=AT_rich 1 43
> ##sequence-region scf_82496 1 2757
> scf_82496 RepeatMasker dispersed_repeat 1 2385 13046 + . Target=rnd-4_family-1046 2478 4915
> ##sequence-region scf_82727 1 4159
> scf_82727 RepeatMasker dispersed_repeat 212 240 29 + . Target=AT_rich 1 29
> scf_82727 RepeatMasker dispersed_repeat 3974 3996 23 + . Target=AT_rich 1 23
> scf_82727 RepeatMasker dispersed_repeat 4124 4159 264 - . Target=rnd-4_family-64 15 50
> ##sequence-region scf_82785 1 4084
> scf_82785 RepeatMasker dispersed_repeat 2166 2189 24 + . Target=AT_rich 1 24
> scf_82785 RepeatMasker dispersed_repeat 3498 3865 660 + . Target=rnd-4_family-690 419 786
> ##sequence-region scf_86740 1 4293
> scf_86740 RepeatMasker dispersed_repeat 290 313 369 + . Target=rnd-4_family-262 1 25
> scf_86740 RepeatMasker dispersed_repeat 314 371 270 + . Target=rnd-4_family-262 2 60
> scf_86740 RepeatMasker dispersed_repeat 359 406 309 - . Target=rnd-4_family-262 13 60
> ##sequence-region scf_86782 1 8564
> scf_86782 RepeatMasker dispersed_repeat 6987 7085 326 - . Target=rnd-4_family-480 1027 1129
> ##sequence-region scf_86808 1 4495
> scf_86808 RepeatMasker dispersed_repeat 6 974 4027 - . Target=rnd-4_family-690 1 969
> scf_86808 RepeatMasker dispersed_repeat 4224 4294 216 + . Target=T-rich 5 74
> ##sequence-region scf_86815 1 4139
> scf_86815 RepeatMasker dispersed_repeat 1 94 645 + . Target=rnd-4_family-262 825 918
> scf_86815 RepeatMasker dispersed_repeat 137 4139 27862 + . Target=rnd-4_family-262 526 4459
> ##sequence-region scf_86823 1 2528
> scf_86823 RepeatMasker dispersed_repeat 82 266 205 + . Target=A-rich 1 173
> scf_86823 RepeatMasker dispersed_repeat 564 641 29 + . Target=AT_rich 1 78
> scf_86823 RepeatMasker dispersed_repeat 1168 1347 218 + . Target=A-rich 2 178
> scf_86823 RepeatMasker dispersed_repeat 1352 1386 28 + . Target=AT_rich 1 35
> scf_86823 RepeatMasker dispersed_repeat 1698 1742 38 + . Target=AT_rich 1 45
> scf_86823 RepeatMasker dispersed_repeat 2087 2127 20 + . Target=AT_rich 1 41
> scf_86823 RepeatMasker dispersed_repeat 2301 2396 26 + . Target=AT_rich 1 96
> scf_86823 RepeatMasker dispersed_repeat 2433 2472 26 + . Target=AT_rich 1 40
> scf_86823 RepeatMasker dispersed_repeat 2489 2528 225 - . Target=rnd-4_family-262 881 920
> ##sequence-region scf_86857 1 2778
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list