[maker-devel] gene models overlapping with TEs
Carson Holt
carsonhh at gmail.com
Tue May 7 05:39:17 MDT 2013
If I had to guess. I imagine the EST evidence includes assembled mRNA-seq
reads? Is that correct?
--Carson
On 13-05-06 11:49 PM, "Mark Yandell" <myandell at genetics.utah.edu> wrote:
>humm, eballing then it doesn't look lie its the UTRss..
>
>Mark Yandell
>Professor of Human Genetics
>H.A. & Edna Benning Presidential Endowed Chair
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>ph:801-587-7707
>
>________________________________________
>From: maker-devel-bounces at yandell-lab.org
>[maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti
>[dcopetti at cals.arizona.edu]
>Sent: Monday, May 06, 2013 3:19 PM
>To: maker-devel at yandell-lab.org
>Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu
>Subject: [maker-devel] gene models overlapping with TEs
>
>Carson,
>
>Analyzing the output of a MAKER run on a rice-sized genome I noticed that
>some gene models (~10%) overlap with TE coding regions. As a QC step, I
>used BEDtools to determine the intersection of "CDS" and "repeatmasker"
>or "repeatrunner" and some 2400 genes overlap for at least 30% of their
>respective length. I am wondering how the gene models still appear in the
>final output, since I thought that the masking step was giving us the
>absoulte confirmation that in our endogenous gene list we do not include
>TE coding regions. Here below an example of a gene (attached picture too):
>
>ObracChr10 maker mRNA 355,056 358,075 . - .
>ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA
>ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788
>ObracChr10 maker exon 355,056 356,874 . - .
>ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
>ObracChr10 maker exon 356,965 357,081 . - .
>ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
>ObracChr10 maker exon 357,209 357,319 . - .
>ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
>ObracChr10 maker exon 357,756 358,075 . - .
>ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
>ObracChr10 maker CDS 357,756 358,075 . - 2
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>ObracChr10 maker CDS 357,209 357,319 . - 2
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>ObracChr10 maker CDS 356,965 357,081 . - 2
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>ObracChr10 maker CDS 355,056 356,874 . - 0
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>ObracChr10 repeatrunner match_part 357,755 358,084 566 -
> .
>ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g
>i_125573769_gb_EAZ15053.1hypothetical 117 226 +320
>ObracChr10 repeatrunner protein_match 357,755 358,084 566 -
> .
>ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic
>al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
>ObracChr10 repeatrunner match_part 357,202 357,294 142 -
> .
>ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g
>i_125573769_gb_EAZ15053.1hypothetical 264 294 +86
>ObracChr10 repeatrunner protein_match 357,202 357,294 142 -
> .
>ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic
>al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
>ObracChr10 repeatrunner match_part 355,059 357,092 3367 -
> .
>ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g
>i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
>ObracChr10 repeatrunner protein_match 355,059 357,092 3367 -
> .
>ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic
>al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
>
>
>This result is valid both for output lines from repeatmasker or
>repeatrunner, and the gene models come from either FGENESH or SNAP
>predictions.
>How can I explain this problem?
>Thanks,
>
>Dario
>
>
>
>
>
>--
>Dario Copetti, PhD
>Research Associate
>Arizona Genomics Institute
>University of Arizona - BIO5
>
>1657 E. Helen St.
>Tucson, AZ 85721
>www.genome.arizona.edu<http://www.genome.arizona.edu>
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
More information about the maker-devel
mailing list