[maker-devel] Format of repeat_gff gff3 file

Carson Holt carsonhh at gmail.com
Fri Oct 19 08:00:12 MDT 2012


This command line should add IDs to the end of the GFF3 to let it pass
through without the error message.

cat file.gff | perl -ane '$id; if(!/^\#/){@F = split(/\t/, $_); chomp
$F[-1];$id++; $F[-1] .= "\;ID=$id"; $_ = join("\t", @F)."\n"} print $_'


MAKER will use the GFF3 first if provided, then run the species specific
library, and then any model organism specified.  So if there is overlap
and one must be excluded, then they will be kept in that same order of
precedence.

Thanks,
Carson



On 12-10-16 12:58 PM, "Mikael Brandström Durling" <mikael.durling at slu.se>
wrote:

>Hi,
>
>I would like to mask my fungal genome from two different sources (ie.
>repbase and an inhouse repeat library). However, I suppose the that if I
>supply a library as rmlib in maker_opts, it will be mutually exclusive to
>the model_org option, in the same way as -spec and -lib options to
>RepeatMasker (I hope I am wrong here...)). To circumvent this, I give the
>model_org option as fungi, and would like to provide maker with
>additional masking as a gff file. I tried by running RepeatMasker with my
>inhouse library, and then used rmOutToGFF3.pl from the RepeatMasker
>package to obtain a gff3 file. This file was supplied to maker as rm_gff
>(see below for a sample from the file). The run fail with backtraces like
>the one paseted below. How should this gff file be formatted for maker to
>understand it? I see that in maker produced gff files, there are
>additional information found in the id of the hits. Is this required?
>
>Maybe it's easier to modify maker to make two rounds of RepeatMasker
>calls if both model_org and rmlib are specified?
>
>Thanks for any input,
>Mikael
>
>
>------------- EXCEPTION: Bio::Root::Exception -------------
>MSG: Must have defined a valid name for Hit
>STACK: Error::throw
>STACK: Bio::Root::Root::throw
>/opt/sw/bioperl/2.1.8/lib/site_perl/5.16.1/Bio/Root/Root.pm:472
>STACK: Bio::Search::Hit::GenericHit::new
>/opt/sw/bioperl/2.1.8/lib/site_perl/5.16.1/Bio/Search/Hit/GenericHit.pm:14
>9
>STACK: Bio::Search::Hit::PhatHit::Base::new
>/net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib
>/Bio/Search/Hit/PhatHit/Base.pm:127
>STACK: Bio::Search::Hit::PhatHit::gff3::new
>/net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib
>/Bio/Search/Hit/PhatHit/gff3.pm:23
>STACK: GFFDB::_load_hits
>/net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib
>/GFFDB.pm:1026
>STACK: GFFDB::phathits_on_chunk
>/net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib
>/GFFDB.pm:651
>STACK: Process::MpiChunk::_go
>/net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib
>/Process/MpiChunk.pm:752
>STACK: Process::MpiChunk::run
>/net/gridnas4/volume4/proj1/mykopat-gbrowse/software/maker/2.26/bin/../lib
>/Process/MpiChunk.pm:331
>STACK: main::node_thread
>/proj/mykopat-gbrowse/software/maker/2.26/bin//maker:1307
>STACK: threads::new
>/proj/mykopat-gbrowse/software/maker/2.26/perl_modules/lib/site_perl/5.16.
>1/x86_64-linux/forks.pm:799
>STACK: /proj/mykopat-gbrowse/software/maker/2.26/bin//maker:803
>-----------------------------------------------------------
>Cannot restore overloading on HASH(0x2580810) (package
>Bio::Root::Exception) (even after a "require Bio::Root::Exception;") at
>/opt/sw/bioperl/2.1.8/lib/5.16.1/x86_64-linux/Storable.pm line 417, at
>/proj/mykopat-gbrowse/software/maker/2.26/perl_modules/lib/site_perl/5.16.
>1/x86_64-linux/forks.pm line 2256.
>Compilation failed in require at
>/proj/mykopat-gbrowse/software/maker/2.26/bin//maker line 11.
>BEGIN failed--compilation aborted at
>/proj/mykopat-gbrowse/software/maker/2.26/bin//maker line 11.
>Perl exited with active threads:
>	1 running and unjoined
>	0 finished and unjoined
>	0 running and detached
>ERROR: Could not open
>'/net/gridnas4/volume4/proj1/mykopat-gbrowse/genomes/CrosV1/CrosV1.maker.o
>utput/CrosV1_datastore/05/2B/scf_55419//theVoid.scf_55419/scf_55419.0.fung
>i.rb.out'
>ERROR: Failed while doing repeat masking
>ERROR: Chunk failed at level:0, tier_type:1
>FAILED CONTIG:scf_55419
>
>
>The gff3-file looks like this:
>##gff-version 3
>##sequence-region scf_42697 1 3949
>scf_42697	RepeatMasker	dispersed_repeat	186	256	22	+	.	Target=AT_rich 1 71
>scf_42697	RepeatMasker	dispersed_repeat	351	378	28	+	.	Target=AT_rich 1 28
>scf_42697	RepeatMasker	dispersed_repeat	560	602	22	+	.	Target=AT_rich 1 43
>##sequence-region scf_82496 1 2757
>scf_82496	RepeatMasker	dispersed_repeat	1	2385	13046	+	.	Target=rnd-4_fami
>ly-1046 2478 4915
>##sequence-region scf_82727 1 4159
>scf_82727	RepeatMasker	dispersed_repeat	212	240	29	+	.	Target=AT_rich 1 29
>scf_82727	RepeatMasker	dispersed_repeat	3974	3996	23	+	.	Target=AT_rich 1
>23
>scf_82727	RepeatMasker	dispersed_repeat	4124	4159	264	-	.	Target=rnd-4_fam
>ily-64 15 50
>##sequence-region scf_82785 1 4084
>scf_82785	RepeatMasker	dispersed_repeat	2166	2189	24	+	.	Target=AT_rich 1
>24
>scf_82785	RepeatMasker	dispersed_repeat	3498	3865	660	+	.	Target=rnd-4_fam
>ily-690 419 786
>##sequence-region scf_86740 1 4293
>scf_86740	RepeatMasker	dispersed_repeat	290	313	369	+	.	Target=rnd-4_famil
>y-262 1 25
>scf_86740	RepeatMasker	dispersed_repeat	314	371	270	+	.	Target=rnd-4_famil
>y-262 2 60
>scf_86740	RepeatMasker	dispersed_repeat	359	406	309	-	.	Target=rnd-4_famil
>y-262 13 60
>##sequence-region scf_86782 1 8564
>scf_86782	RepeatMasker	dispersed_repeat	6987	7085	326	-	.	Target=rnd-4_fam
>ily-480 1027 1129
>##sequence-region scf_86808 1 4495
>scf_86808	RepeatMasker	dispersed_repeat	6	974	4027	-	.	Target=rnd-4_family
>-690 1 969
>scf_86808	RepeatMasker	dispersed_repeat	4224	4294	216	+	.	Target=T-rich 5
>74
>##sequence-region scf_86815 1 4139
>scf_86815	RepeatMasker	dispersed_repeat	1	94	645	+	.	Target=rnd-4_family-2
>62 825 918
>scf_86815	RepeatMasker	dispersed_repeat	137	4139	27862	+	.	Target=rnd-4_fa
>mily-262 526 4459
>##sequence-region scf_86823 1 2528
>scf_86823	RepeatMasker	dispersed_repeat	82	266	205	+	.	Target=A-rich 1 173
>scf_86823	RepeatMasker	dispersed_repeat	564	641	29	+	.	Target=AT_rich 1 78
>scf_86823	RepeatMasker	dispersed_repeat	1168	1347	218	+	.	Target=A-rich 2
>178
>scf_86823	RepeatMasker	dispersed_repeat	1352	1386	28	+	.	Target=AT_rich 1
>35
>scf_86823	RepeatMasker	dispersed_repeat	1698	1742	38	+	.	Target=AT_rich 1
>45
>scf_86823	RepeatMasker	dispersed_repeat	2087	2127	20	+	.	Target=AT_rich 1
>41
>scf_86823	RepeatMasker	dispersed_repeat	2301	2396	26	+	.	Target=AT_rich 1
>96
>scf_86823	RepeatMasker	dispersed_repeat	2433	2472	26	+	.	Target=AT_rich 1
>40
>scf_86823	RepeatMasker	dispersed_repeat	2489	2528	225	-	.	Target=rnd-4_fam
>ily-262 881 920
>##sequence-region scf_86857 1 2778
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list