<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Carson,<br>
    <br>
    Analyzing the output of a MAKER run on a rice-sized genome I noticed
    that some gene models (~10%) overlap with TE coding regions. As a QC
    step, I used BEDtools to determine the intersection of "CDS" and
    "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at
    least 30% of their respective length. I am wondering how the gene
    models still appear in the final output, since I thought that the
    masking step was giving us the absoulte confirmation that in our
    endogenous gene list we do not include TE coding regions. Here below
    an example of a gene (attached picture too):<br>
    <br>
    <meta http-equiv="CONTENT-TYPE" content="text/html;
      charset=ISO-8859-1">
    <table height="551" width="1167" border="0" cellspacing="0" cols="9">
      <colgroup span="5" width="85"></colgroup> <colgroup width="35"></colgroup>
      <colgroup span="2" width="31"></colgroup> <colgroup width="85"></colgroup>
      <tbody>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">mRNA</td>
          <td sdval="355056" sdnum="1033;0;#,##0" align="RIGHT">355,056</td>
          <td sdval="358075" sdnum="1033;0;#,##0" align="RIGHT">358,075</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">exon</td>
          <td sdval="355056" sdnum="1033;0;#,##0" align="RIGHT">355,056</td>
          <td sdval="356874" sdnum="1033;0;#,##0" align="RIGHT">356,874</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">exon</td>
          <td sdval="356965" sdnum="1033;0;#,##0" align="RIGHT">356,965</td>
          <td sdval="357081" sdnum="1033;0;#,##0" align="RIGHT">357,081</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">exon</td>
          <td sdval="357209" sdnum="1033;0;#,##0" align="RIGHT">357,209</td>
          <td sdval="357319" sdnum="1033;0;#,##0" align="RIGHT">357,319</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">exon</td>
          <td sdval="357756" sdnum="1033;0;#,##0" align="RIGHT">357,756</td>
          <td sdval="358075" sdnum="1033;0;#,##0" align="RIGHT">358,075</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">CDS</td>
          <td sdval="357756" sdnum="1033;0;#,##0" align="RIGHT">357,756</td>
          <td sdval="358075" sdnum="1033;0;#,##0" align="RIGHT">358,075</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td sdval="2" sdnum="1033;" align="RIGHT">2</td>
          <td align="LEFT">ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">CDS</td>
          <td sdval="357209" sdnum="1033;0;#,##0" align="RIGHT">357,209</td>
          <td sdval="357319" sdnum="1033;0;#,##0" align="RIGHT">357,319</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td sdval="2" sdnum="1033;" align="RIGHT">2</td>
          <td align="LEFT">ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">CDS</td>
          <td sdval="356965" sdnum="1033;0;#,##0" align="RIGHT">356,965</td>
          <td sdval="357081" sdnum="1033;0;#,##0" align="RIGHT">357,081</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td sdval="2" sdnum="1033;" align="RIGHT">2</td>
          <td align="LEFT">ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16">ObracChr10</td>
          <td align="LEFT">maker</td>
          <td align="LEFT">CDS</td>
          <td sdval="355056" sdnum="1033;0;#,##0" align="RIGHT">355,056</td>
          <td sdval="356874" sdnum="1033;0;#,##0" align="RIGHT">356,874</td>
          <td align="LEFT">.</td>
          <td align="LEFT">-</td>
          <td sdval="0" sdnum="1033;" align="RIGHT">0</td>
          <td align="LEFT">ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1</td>
        </tr>
        <tr>
          <td align="LEFT" height="16"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td sdnum="1033;0;#,##0" align="LEFT"><br>
          </td>
          <td sdnum="1033;0;#,##0" align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
        </tr>
        <tr>
          <td align="LEFT" height="16"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td sdnum="1033;0;#,##0" align="LEFT"><br>
          </td>
          <td sdnum="1033;0;#,##0" align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
          <td align="LEFT"><br>
          </td>
        </tr>
        <tr>
          <td align="LEFT" height="17">ObracChr10</td>
          <td align="LEFT">repeatrunner</td>
          <td align="LEFT">match_part</td>
          <td sdval="357755" sdnum="1033;0;#,##0" align="RIGHT">357,755</td>
          <td sdval="358084" sdnum="1033;0;#,##0" align="RIGHT">358,084</td>
          <td sdval="566" sdnum="1033;" align="RIGHT">566</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
            117 226 +320</td>
        </tr>
        <tr>
          <td align="LEFT" height="17">ObracChr10</td>
          <td align="LEFT">repeatrunner</td>
          <td align="LEFT">protein_match</td>
          <td sdval="357755" sdnum="1033;0;#,##0" align="RIGHT">357,755</td>
          <td sdval="358084" sdnum="1033;0;#,##0" align="RIGHT">358,084</td>
          <td sdval="566" sdnum="1033;" align="RIGHT">566</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
            117 226 +320</td>
        </tr>
        <tr>
          <td align="LEFT" height="17">ObracChr10</td>
          <td align="LEFT">repeatrunner</td>
          <td align="LEFT">match_part</td>
          <td sdval="357202" sdnum="1033;0;#,##0" align="RIGHT">357,202</td>
          <td sdval="357294" sdnum="1033;0;#,##0" align="RIGHT">357,294</td>
          <td sdval="142" sdnum="1033;" align="RIGHT">142</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
            264 294 +86</td>
        </tr>
        <tr>
          <td align="LEFT" height="17">ObracChr10</td>
          <td align="LEFT">repeatrunner</td>
          <td align="LEFT">protein_match</td>
          <td sdval="357202" sdnum="1033;0;#,##0" align="RIGHT">357,202</td>
          <td sdval="357294" sdnum="1033;0;#,##0" align="RIGHT">357,294</td>
          <td sdval="142" sdnum="1033;" align="RIGHT">142</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
            264 294 +86</td>
        </tr>
        <tr>
          <td align="LEFT" height="17">ObracChr10</td>
          <td align="LEFT">repeatrunner</td>
          <td align="LEFT">match_part</td>
          <td sdval="355059" sdnum="1033;0;#,##0" align="RIGHT">355,059</td>
          <td sdval="357092" sdnum="1033;0;#,##0" align="RIGHT">357,092</td>
          <td sdval="3367" sdnum="1033;" align="RIGHT">3367</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
            289 937 +1816</td>
        </tr>
        <tr>
          <td align="LEFT" height="17">ObracChr10</td>
          <td align="LEFT">repeatrunner</td>
          <td align="LEFT">protein_match</td>
          <td sdval="355059" sdnum="1033;0;#,##0" align="RIGHT">355,059</td>
          <td sdval="357092" sdnum="1033;0;#,##0" align="RIGHT">357,092</td>
          <td sdval="3367" sdnum="1033;" align="RIGHT">3367</td>
          <td align="LEFT">-</td>
          <td align="LEFT">.</td>
          <td align="LEFT">ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical
            289 937 +1816</td>
        </tr>
      </tbody>
    </table>
    <title></title>
    <meta name="GENERATOR" content="LibreOffice 3.5 (Linux)">
    <style>
                <!-- 
                BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Liberation Sans"; font-size:x-small }
                 -->
        </style><br>
    <br>
    This result is valid both for output lines from repeatmasker or
    repeatrunner, and the gene models come from either FGENESH or SNAP
    predictions.<br>
    How can I explain this problem?<br>
    Thanks,<br>
    <br>
    Dario<br>
    <br>
    <br>
    <br>
    <br>
    <pre class="moz-signature" cols="72">-- 
Dario Copetti, PhD
Research Associate
Arizona Genomics Institute
University of Arizona - BIO5

1657 E. Helen St.
Tucson, AZ  85721
<a class="moz-txt-link-abbreviated" href="http://www.genome.arizona.edu">www.genome.arizona.edu</a>
</pre>
  </body>
</html>