[maker-devel] gene models overlapping with TEs
Carson Holt
Carson.Holt at oicr.on.ca
Mon May 6 20:22:23 MDT 2013
Repeats can still happen in genes. So an outright block actually causes more errors than it avoids, and a mixed approach of hard and soft masking becomes more appropriate. The masking step stops alignments from seeding in repeat regions, but if alignments seed in non-repeat regions then they can still extend through repeat regions during polishing steps (I.e. The EST evidence supports extension through the repeat and inclusion of the TE).
--Carson
From: Dario Copetti <dcopetti at cals.arizona.edu<mailto:dcopetti at cals.arizona.edu>>
Organization: AGI
Date: Monday, 6 May, 2013 5:19 PM
To: <maker-devel at yandell-lab.org<mailto:maker-devel at yandell-lab.org>>
Cc: "kapeel at cals.arizona.edu<mailto:kapeel at cals.arizona.edu>" <kapeel at cals.arizona.edu<mailto:kapeel at cals.arizona.edu>>, "Stein, Joshua" <steinj at cshl.edu<mailto:steinj at cshl.edu>>, Rod Wing <rwing at Ag.arizona.edu<mailto:rwing at Ag.arizona.edu>>
Subject: gene models overlapping with TEs
Carson,
Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too):
ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788
ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1
ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1
ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1
ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1
ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1
ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320
ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86
ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816
This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions.
How can I explain this problem?
Thanks,
Dario
--
Dario Copetti, PhD
Research Associate
Arizona Genomics Institute
University of Arizona - BIO5
1657 E. Helen St.
Tucson, AZ 85721
www.genome.arizona.edu<http://www.genome.arizona.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130507/17513502/attachment-0003.html>
More information about the maker-devel
mailing list