[maker-devel] gene models overlapping with TEs

Mark Yandell myandell at genetics.utah.edu
Mon May 6 21:47:49 MDT 2013


could the TEs be in the UTRs? Also, maybe some of these are low complexity regions?

Mark Yandell
Professor of Human Genetics
H.A. & Edna Benning Presidential Endowed Chair
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
ph:801-587-7707

________________________________________
From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu]
Sent: Monday, May 06, 2013 3:19 PM
To: maker-devel at yandell-lab.org
Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu
Subject: [maker-devel] gene models overlapping with TEs

Carson,

Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too):

ObracChr10      maker   mRNA    355,056 358,075 .       -       .       ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788
ObracChr10      maker   exon    355,056 356,874 .       -       .       ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
ObracChr10      maker   exon    356,965 357,081 .       -       .       ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
ObracChr10      maker   exon    357,209 357,319 .       -       .       ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
ObracChr10      maker   exon    357,756 358,075 .       -       .       ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
ObracChr10      maker   CDS     357,756 358,075 .       -       2       ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10      maker   CDS     357,209 357,319 .       -       2       ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10      maker   CDS     356,965 357,081 .       -       2       ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10      maker   CDS     355,056 356,874 .       -       0       ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1




















ObracChr10      repeatrunner    match_part      357,755 358,084 566     -       .       ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
ObracChr10      repeatrunner    protein_match   357,755 358,084 566     -       .       ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
ObracChr10      repeatrunner    match_part      357,202 357,294 142     -       .       ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
ObracChr10      repeatrunner    protein_match   357,202 357,294 142     -       .       ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
ObracChr10      repeatrunner    match_part      355,059 357,092 3367    -       .       ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
ObracChr10      repeatrunner    protein_match   355,059 357,092 3367    -       .       ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816


This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions.
How can I explain this problem?
Thanks,

Dario





--
Dario Copetti, PhD
Research Associate
Arizona Genomics Institute
University of Arizona - BIO5

1657 E. Helen St.
Tucson, AZ  85721
www.genome.arizona.edu<http://www.genome.arizona.edu>






More information about the maker-devel mailing list