[maker-devel] gene models overlapping with TEs
Dario Copetti
dcopetti at cals.arizona.edu
Mon May 6 15:19:42 MDT 2013
Carson,
Analyzing the output of a MAKER run on a rice-sized genome I noticed
that some gene models (~10%) overlap with TE coding regions. As a QC
step, I used BEDtools to determine the intersection of "CDS" and
"repeatmasker" or "repeatrunner" and some 2400 genes overlap for at
least 30% of their respective length. I am wondering how the gene models
still appear in the final output, since I thought that the masking step
was giving us the absoulte confirmation that in our endogenous gene list
we do not include TE coding regions. Here below an example of a gene
(attached picture too):
ObracChr10 maker mRNA 355,056 358,075 . - .
ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788
ObracChr10 maker exon 355,056 356,874 . - .
ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
ObracChr10 maker exon 356,965 357,081 . - .
ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
ObracChr10 maker exon 357,209 357,319 . - .
ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
ObracChr10 maker exon 357,756 358,075 . - .
ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
ObracChr10 maker CDS 357,756 358,075 . - 2
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 357,209 357,319 . - 2
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 356,965 357,081 . - 2
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 355,056 356,874 . - 0
ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 repeatrunner match_part 357,755 358,084 566 - .
ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
117 226 +320
ObracChr10 repeatrunner protein_match 357,755 358,084 566 - .
ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
117 226 +320
ObracChr10 repeatrunner match_part 357,202 357,294 142 - .
ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
264 294 +86
ObracChr10 repeatrunner protein_match 357,202 357,294 142 - .
ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
264 294 +86
ObracChr10 repeatrunner match_part 355,059 357,092 3367 - .
ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
289 937 +1816
ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - .
ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
289 937 +1816
This result is valid both for output lines from repeatmasker or
repeatrunner, and the gene models come from either FGENESH or SNAP
predictions.
How can I explain this problem?
Thanks,
Dario
--
Dario Copetti, PhD
Research Associate
Arizona Genomics Institute
University of Arizona - BIO5
1657 E. Helen St.
Tucson, AZ 85721
www.genome.arizona.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130506/0f009a92/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gene_TE.jpg
Type: image/jpeg
Size: 177299 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130506/0f009a92/attachment-0002.jpg>
More information about the maker-devel
mailing list