[maker-devel] gene models overlapping with TEs

Carson Holt carsonhh at gmail.com
Tue May 7 05:39:17 MDT 2013


If I had to guess.  I imagine the EST evidence includes assembled mRNA-seq
reads?  Is that correct?

--Carson



On 13-05-06 11:49 PM, "Mark Yandell" <myandell at genetics.utah.edu> wrote:

>humm, eballing then it doesn't look lie its the UTRss..
>
>Mark Yandell
>Professor of Human Genetics
>H.A. & Edna Benning Presidential Endowed Chair
>Eccles Institute of Human Genetics
>University of Utah
>15 North 2030 East, Room 2100
>Salt Lake City, UT 84112-5330
>ph:801-587-7707
>
>________________________________________
>From: maker-devel-bounces at yandell-lab.org
>[maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti
>[dcopetti at cals.arizona.edu]
>Sent: Monday, May 06, 2013 3:19 PM
>To: maker-devel at yandell-lab.org
>Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu
>Subject: [maker-devel] gene models overlapping with TEs
>
>Carson,
>
>Analyzing the output of a MAKER run on a rice-sized genome I noticed that
>some gene models (~10%) overlap with TE coding regions. As a QC step, I
>used BEDtools to determine the intersection of "CDS" and "repeatmasker"
>or "repeatrunner" and some 2400 genes overlap for at least 30% of their
>respective length. I am wondering how the gene models still appear in the
>final output, since I thought that the masking step was giving us the
>absoulte confirmation that in our endogenous gene list we do not include
>TE coding regions. Here below an example of a gene (attached picture too):
>
>ObracChr10      maker   mRNA    355,056 358,075 .       -       .
>ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA
>ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788
>ObracChr10      maker   exon    355,056 356,874 .       -       .
>ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
>ObracChr10      maker   exon    356,965 357,081 .       -       .
>ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
>ObracChr10      maker   exon    357,209 357,319 .       -       .
>ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
>ObracChr10      maker   exon    357,756 358,075 .       -       .
>ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
>ObracChr10      maker   CDS     357,756 358,075 .       -       2
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>ObracChr10      maker   CDS     357,209 357,319 .       -       2
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>ObracChr10      maker   CDS     356,965 357,081 .       -       2
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>ObracChr10      maker   CDS     355,056 356,874 .       -       0
>ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>ObracChr10      repeatrunner    match_part      357,755 358,084 566     -
>      .       
>ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g
>i_125573769_gb_EAZ15053.1hypothetical 117 226 +320
>ObracChr10      repeatrunner    protein_match   357,755 358,084 566     -
>      .       
>ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic
>al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
>ObracChr10      repeatrunner    match_part      357,202 357,294 142     -
>      .       
>ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g
>i_125573769_gb_EAZ15053.1hypothetical 264 294 +86
>ObracChr10      repeatrunner    protein_match   357,202 357,294 142     -
>      .       
>ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic
>al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
>ObracChr10      repeatrunner    match_part      355,059 357,092 3367    -
>      .       
>ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g
>i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
>ObracChr10      repeatrunner    protein_match   355,059 357,092 3367    -
>      .       
>ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic
>al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
>
>
>This result is valid both for output lines from repeatmasker or
>repeatrunner, and the gene models come from either FGENESH or SNAP
>predictions.
>How can I explain this problem?
>Thanks,
>
>Dario
>
>
>
>
>
>--
>Dario Copetti, PhD
>Research Associate
>Arizona Genomics Institute
>University of Arizona - BIO5
>
>1657 E. Helen St.
>Tucson, AZ  85721
>www.genome.arizona.edu<http://www.genome.arizona.edu>
>
>
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






More information about the maker-devel mailing list