[maker-devel] gene models overlapping with TEs
Mark Yandell
myandell at genetics.utah.edu
Mon May 6 21:49:51 MDT 2013
humm, eballing then it doesn't look lie its the UTRss..
Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707
________________________________________
From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu]
Sent: Monday, May 06, 2013 3:19 PM
To: maker-devel at yandell-lab.org
Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu
Subject: [maker-devel] gene models overlapping with TEs
Carson,
Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too):
ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788
ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions.
How can I explain this problem?
Thanks,
Dario
--
Dario Copetti, PhD
Research Associate
Arizona Genomics Institute
University of Arizona - BIO5
1657 E. Helen St.
Tucson, AZ 85721
www.genome.arizona.edu<http://www.genome.arizona.edu>
More information about the maker-devel
mailing list