[maker-devel] gene models overlapping with TEs

Dario Copetti dcopetti at cals.arizona.edu
Mon May 6 15:19:42 MDT 2013


Carson,

Analyzing the output of a MAKER run on a rice-sized genome I noticed 
that some gene models (~10%) overlap with TE coding regions. As a QC 
step, I used BEDtools to determine the intersection of "CDS" and 
"repeatmasker" or "repeatrunner" and some 2400 genes overlap for at 
least 30% of their respective length. I am wondering how the gene models 
still appear in the final output, since I thought that the masking step 
was giving us the absoulte confirmation that in our endogenous gene list 
we do not include TE coding regions. Here below an example of a gene 
(attached picture too):

ObracChr10 	maker 	mRNA 	355,056 	358,075 	. 	- 	. 
ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 

ObracChr10 	maker 	exon 	355,056 	356,874 	. 	- 	. 
ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
ObracChr10 	maker 	exon 	356,965 	357,081 	. 	- 	. 
ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
ObracChr10 	maker 	exon 	357,209 	357,319 	. 	- 	. 
ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
ObracChr10 	maker 	exon 	357,756 	358,075 	. 	- 	. 
ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
ObracChr10 	maker 	CDS 	357,756 	358,075 	. 	- 	2 
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 	maker 	CDS 	357,209 	357,319 	. 	- 	2 
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 	maker 	CDS 	356,965 	357,081 	. 	- 	2 
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 	maker 	CDS 	355,056 	356,874 	. 	- 	0 
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1

	
	
	
	
	
	
	
	

	
	
	
	
	
	
	
	
ObracChr10 	repeatrunner 	match_part 	357,755 	358,084 	566 	- 	. 
ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 
117 226 +320
ObracChr10 	repeatrunner 	protein_match 	357,755 	358,084 	566 	- 	. 
ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 
117 226 +320
ObracChr10 	repeatrunner 	match_part 	357,202 	357,294 	142 	- 	. 
ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 
264 294 +86
ObracChr10 	repeatrunner 	protein_match 	357,202 	357,294 	142 	- 	. 
ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 
264 294 +86
ObracChr10 	repeatrunner 	match_part 	355,059 	357,092 	3367 	- 	. 
ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 
289 937 +1816
ObracChr10 	repeatrunner 	protein_match 	355,059 	357,092 	3367 	- 	. 
ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 
289 937 +1816



This result is valid both for output lines from repeatmasker or 
repeatrunner, and the gene models come from either FGENESH or SNAP 
predictions.
How can I explain this problem?
Thanks,

Dario




-- 
Dario Copetti, PhD
Research Associate
Arizona Genomics Institute
University of Arizona - BIO5

1657 E. Helen St.
Tucson, AZ  85721
www.genome.arizona.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130506/0f009a92/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gene_TE.jpg
Type: image/jpeg
Size: 177299 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130506/0f009a92/attachment-0002.jpg>


More information about the maker-devel mailing list