<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif;"><div>The problem in the example you sent is the geneseqer entries in the GFF3 you are passing in.  It is causing merge of gene clusters.  The result is that UTR is being over extended and is overlapping on the models (and probably some models get merged).  As you noticed you can't have overlapping models on the same strand. If you set score_preds=1 in the maker_opts.ctl file it will give you AED scores for the rejected ab initio models.  You will notice that none of them score better than 0.23.</div><div><br></div><div>One thing you can do is set correct_est_fusion=1.  This tries to correct for erroneous EST/transcript evidence that leads to over extend UTR and false gene merging.  You will see in the attached image that is trims back the overlapping 3' and 5' UTR for the overlapping gene models, given that MAKER believes the evidence leading to the overlap is likely low confidence and is a false merge of regions.  I think much of your geneseqer input is more of a problem than a help for the annotation. Many seem to be spurious alignments.</div><div><br></div><div>--Carson</div><div><br></div><div><br></div><div><img src="cid:6246A771-E6A9-4875-9362-DC8A7A5BC9C4" type="image/png"></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> Daniel Standage <<a href="mailto:daniel.standage@gmail.com">daniel.standage@gmail.com</a>><br><span style="font-weight:bold">Date: </span> Friday, June 6, 2014 at 5:58 PM<br><span style="font-weight:bold">To: </span> Volker Brendel <<a href="mailto:vbrendel@indiana.edu">vbrendel@indiana.edu</a>><br><span style="font-weight:bold">Cc: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>>, Maker Mailing List <<a href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br><span style="font-weight:bold">Subject: </span> Re: [maker-devel] Filtering of ab initio gene models<br></div><div><br></div><div dir="ltr"><div><div><div><div>In the example sent previously, transcript TSA024184 overlaps with the 3' end of our gene model's CDS by 3 nucleotides. If I manually change the transcript's end coordinate (6400 to 6100) so that there are two separate non-overlapping evidence clusters, two models are reported as expected. But I can even get both models reported with a much smaller change (6400 to 6395), where the UTRs still overlap but the CDS does not overlap with the UTR. The 5' end of our gene model's CDS also overlaps with another transcript. Maker has no problem reporting both of these gene models though, probably since they're on different strands?<br></div><br></div>So correct me if I'm wrong, but it appears that Maker will report overlapping gene models if they are on opposite strands or if no CDS is involved in the overlap. Is there any way this behavior can be configured?<br></div><br></div><div>On another note, we're considering your suggestion to integrate EVM with Maker. One possibility discussed is to run Maker 4 separate times (once for each of Augustus, GeneMark, SNAP, and our model_gff models), each time with all our transcript/protein evidence, prior to consensus modeling with EVM. Would that provide any benefit over running Maker a single time with all prediction sources simultaneously?<br></div><div><br></div><div>
Thanks,<br></div>Daniel<br><div class="gmail_extra"><br clear="all"><div><div dir="ltr"><br>--<br>Daniel S. Standage<br>Ph.D. Candidate<br>Computational Genome Science Laboratory<br>Indiana University<br></div></div><br><br><div class="gmail_quote">On Fri, Jun 6, 2014 at 5:52 PM, Volker Brendel <span dir="ltr"><<a href="mailto:vbrendel@indiana.edu" target="_blank">vbrendel@indiana.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">



  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Hi Carson,<br>
    is there a way of allowing MAKER to add UTRs to our external models
    (supplied by the pred_gff or model_gff tag)?  This seems to be one
    problem we are running into.  Our external models are high quality,
    but CDS only.  Thus their score gets knocked down relative to ab
    initio predictions with added UTRs.<br>
    <br>
    Daniel will have more questions/observations later with regard to
    overlapping gene models (we definitely need to allow gene models to
    overlap in the UTRs, because transcript evidence clearly shows such
    negative intergenic spaces).<br>
    <br>
    Thanks for all your help!<br>
    Volker<div><div><br>
    <br>
    <div>On 6/6/2014 11:39 AM, Carson Holt
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div>snap_masked-$seqid-processed-gene was produced by SNAP on the
        repeat masked sequence without hints (i.e. the ab initio call).</div>
      <div>maker-$seqid-snap-gene was produced by SNAP after receiving
        hints from MAKER.</div>
      <div><br>
      </div>
      <div>In both cases MAKER is allowed to add UTR to the model (hence
        the 'processed' tag).</div>
      <div><br>
      </div>
      <div>--Carson</div>
      <div><br>
      </div>
      <div><br>
      </div>
      <span>
        <div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style="font-weight:bold">From: </span> Daniel Standage <<a href="mailto:daniel.standage@gmail.com" target="_blank">daniel.standage@gmail.com</a>><br>
          <span style="font-weight:bold">Date: </span> Friday, June 6,
          2014 at 10:33 AM<br>
          <span style="font-weight:bold">To: </span> Carson Holt <<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>><br>
          <span style="font-weight:bold">Cc: </span> Maker Mailing List
          <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>>,
          Volker Brendel <<a href="mailto:vbrendel@indiana.edu" target="_blank">vbrendel@indiana.edu</a>><br>
          <span style="font-weight:bold">Subject: </span> Re:
          [maker-devel] Filtering of ab initio gene models<br>
        </div>
        <div><br>
        </div>
        <div dir="ltr">
          <div>
            <div>Another question: is there documentation anywhere for
              the naming conventions of the genes annotated by Maker? Of
              course it's easy to spot genes based on a particular <i>ab
                initio</i> gene predictor, as the names are in the IDs.
              But what is the significance of, say,
              "snap_masked-$seqid-processed-gene" in a gene ID vs
              "maker-$seqid-snap-gene"?<br>
              <br>
            </div>
            Thanks,<br>
          </div>
          Daniel<br>
        </div>
        <div class="gmail_extra"><br clear="all">
          <div>
            <div dir="ltr"><br>
              --<br>
              Daniel S. Standage<br>
              Ph.D. Candidate<br>
              Computational Genome Science Laboratory<br>
              Indiana University<br>
            </div>
          </div>
          <br>
          <br>
          <div class="gmail_quote">On Thu, Jun 5, 2014 at 2:05 PM,
            Daniel Standage <span dir="ltr"><<a href="mailto:daniel.standage@gmail.com" target="_blank">daniel.standage@gmail.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div dir="ltr">
                <div>
                  <div>I have attached data for a small 18kb region with
                    a handful of genes, as well as the corresponding
                    maker_opts.ctl file. (This is a smaller and
                    different data set than what I was looking at
                    yesterday, with a more well-defined problem).<br>
                    <br>
                    With the data files as is, Maker 2.31.3 reports a
                    model from 4125 to 6400 with an AED of 0.23. If you
                    exclude transcript TSA024184, Maker reports a
                    different gene from 6111 to 8345 with an AED of
                    0.01. Both of these genes have transcript support:
                    will Maker report overlapping genes under any
                    conditions? And even if Maker is forced to choose
                    only a single gene to report, why would the model
                    from 4125 to 6400 ever be reported in place of the
                    one from 6111 to 8345, especially since this is
                    provided in the model_gff file?<br>
                    <br>
                  </div>
                  Even when transcript TSA024184 is included, Maker 2.10
                  reports the high-confidence gene from 611 to 8345.<br>
                  <br>
                </div>
                Any light you could shed would be helpful. Thanks!<br>
              </div>
              <div class="gmail_extra">
                <div><br clear="all">
                  <div>
                    <div dir="ltr"><br>
                      --<br>
                      Daniel S. Standage<br>
                      Ph.D. Candidate<br>
                      Computational Genome Science Laboratory<br>
                      Indiana University<br>
                    </div>
                  </div>
                  <br>
                  <br>
                </div>
                <div>
                  <div>
                    <div class="gmail_quote">On Wed, Jun 4, 2014 at 3:17
                      PM, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                        <div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
                          <div>Just eAED, but eAED can affects selection
                            of ab initio results.  For example reading
                            frame match of protein evidence which also
                            affects whether evidence from single_exon=1
                            and genes with single_exon protein evidence
                            get kept.  There is also the assumption that
                            your alignments in GFF3 are are correctly
                            spliced (like BLAT does).  So giving blastn
                            results as precomputed est_gff would create
                            a lot of noise, since maker ignores blastn
                            and is using it only to seed the polished
                            exonerate alignments.</div>
                          <div><br>
                          </div>
                          <div>--Carson</div>
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                          <span>
                            <div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style="font-weight:bold">From: </span>
                              Daniel Standage <<a href="mailto:daniel.standage@gmail.com" target="_blank">daniel.standage@gmail.com</a>><br>
                              <span style="font-weight:bold">Date: </span>
                              Wednesday, June 4, 2014 at 1:11 PM<br>
                              <span style="font-weight:bold">To: </span>
                              Carson Holt <<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>><br>
                              <span style="font-weight:bold">Cc: </span>
                              Maker Mailing List <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br>
                              <span style="font-weight:bold">Subject: </span>
                              Re: [maker-devel] Filtering of ab initio
                              gene models<br>
                            </div>
                            <div>
                              <div>
                                <div><br>
                                </div>
                                <div dir="ltr">I do not provide Gap or
                                  Target attributes in the GFF3. Will
                                  this affect the AED as well, or just
                                  the eAED?<br>
                                </div>
                                <div class="gmail_extra"><br clear="all">
                                  <div>
                                    <div dir="ltr"><br>
                                      --<br>
                                      Daniel S. Standage<br>
                                      Ph.D. Candidate<br>
                                      Computational Genome Science
                                      Laboratory<br>
                                      Indiana University<br>
                                    </div>
                                  </div>
                                  <br>
                                  <br>
                                  <div class="gmail_quote">On Wed, Jun
                                    4, 2014 at 3:09 PM, Carson Holt <span dir="ltr"><<a href="mailto:carsonhh@gmail.com" target="_blank">carsonhh@gmail.com</a>></span>
                                    wrote:<br>
                                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                      <div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
                                        <div>Sure.  that would be
                                          helpful.  One question. Do you
                                          provide the Gap attribute in
                                          your precomputed alignments?
                                           Having or not having that
                                          attribute affects the eAED
                                          score which takes reading
                                          frame into account, and may
                                          cause some things to be kept
                                          that normally would be
                                          dropped, because MAKER won't
                                          be able to take the points of
                                          mismatch of the alignment into
                                          account (it just assumes match
                                          everywhere).</div>
                                        <div><br>
                                        </div>
                                        <div>--Carson</div>
                                        <div><br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <span>
                                          <div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt"><span style="font-weight:bold">From:
                                            </span> Daniel Standage <<a href="mailto:daniel.standage@gmail.com" target="_blank">daniel.standage@gmail.com</a>><br>
                                            <span style="font-weight:bold">Date:
                                            </span> Wednesday, June 4,
                                            2014 at 1:03 PM<br>
                                            <span style="font-weight:bold">To:
                                            </span> Maker Mailing List
                                            <<a href="mailto:maker-devel@yandell-lab.org" target="_blank">maker-devel@yandell-lab.org</a>><br>
                                            <span style="font-weight:bold">Subject:
                                            </span> [maker-devel]
                                            Filtering of ab initio gene
                                            models<br>
                                          </div>
                                          <div>
                                            <div>
                                              <div><br>
                                              </div>
                                              <div dir="ltr">
                                                <div>
                                                  <div>
                                                    <div>Thanks everyone
                                                      for your responses
                                                      recently!<br>
                                                      <br>
                                                    </div>
                                                    The reason for my
                                                    recent flurry of
                                                    email activity is
                                                    that I'm seeing some
                                                    unexpected trends
                                                    when running the new
                                                    version of Maker
                                                    with precomputed
                                                    alignments. Compared
                                                    with an annotation I
                                                    did a while ago
                                                    (Maker 2.10,
                                                    Maker-computed
                                                    alignments), this
                                                    new annotation has a
                                                    substantial number
                                                    of new genes
                                                    annotated. If I
                                                    compare
                                                    distributions of AED
                                                    scores between the
                                                    old and new
                                                    annotation, it's
                                                    clear that the new
                                                    annotation has a lot
                                                    more low-quality
                                                    models. If I look at
                                                    new gene models that
                                                    do not overlap with
                                                    any gene model from
                                                    the old annotation,
                                                    the likelihood that
                                                    it's a low-quality
                                                    model is much
                                                    higher.<br>
                                                    <br>
                                                  </div>
                                                  I decided to run a
                                                  little experiment. I
                                                  annotated a scaffold
                                                  first using Maker 2.10
                                                  and then using Maker
                                                  2.31.3. I both cases,
                                                  I used the same
                                                  pre-computed
                                                  transcript and protein
                                                  alignments and the
                                                  same (latest) version
                                                  of SNAP as the only <i>ab
                                                    initio</i>
                                                  predictor. Maker 2.10
                                                  predicted 44 genes
                                                  while Maker 2.31.3
                                                  predicted 63. If we
                                                  group gene models into
                                                  loci by overlap, there
                                                  are 33 loci with gene
                                                  models from both 2.10
                                                  and 2.31.3, 1 locus
                                                  with only models from
                                                  2.10, and 28 loci with
                                                  only models from
                                                  2.31.3.<br>
                                                  <br>
                                                </div>
                                                Before this experiment,
                                                I assumed the issue was
                                                related to providing
                                                pre-computed alignments
                                                in GFF3 format and
                                                perhaps violating some
                                                important assumption.
                                                However, this experiment
                                                makes me wonder whether
                                                there have been changes
                                                to how Maker filters <i>ab
                                                  initio</i> gene models
                                                between version 2.10 and
                                                version 2.31.3? Do you
                                                have any ideas? If it
                                                would help, I could put
                                                together a small data
                                                set that reproduces the
                                                behavior I just
                                                described.<br>
                                                <br>
                                                Thanks!<br clear="all">
                                                <div>
                                                  <div>
                                                    <div>
                                                      <div>
                                                        <div>
                                                          <div dir="ltr"><br>
                                                          --<br>
                                                          Daniel S.
                                                          Standage<br>
                                                          Ph.D.
                                                          Candidate<br>
                                                          Computational
                                                          Genome Science
                                                          Laboratory<br>
                                                          Indiana
                                                          University<br>
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                          _______________________________________________
maker-devel
                                          mailing list
                                          <a href="mailto:maker-devel@box290.bluehost.com" target="_blank">maker-devel@box290.bluehost.com</a><a href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org" target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></span></div>



                                    </blockquote>
                                  </div>
                                  <br>
                                </div>
                              </div>
                            </div>
                          </span></div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </div>
              </div>
            </blockquote>
          </div>
          <br>
        </div>
      </span>
    </blockquote>
    <br>
    </div></div><span><font color="#888888"><pre cols="72">-- 
Volker Brendel
Professor of Biology and Computer Science
Indiana University
Department of Biology & School of Informatics and Computing
Simon Hall 205C
212 South Hawthorne Drive
Bloomington, IN 47405-7003

Tel.: <a href="tel:%28812%29%20855-7074" value="+18128557074" target="_blank">(812) 855-7074</a><a href="http://brendelgroup.org/" target="_blank">http://brendelgroup.org/</a></pre>
  </font></span></div></blockquote></div><br></div></div></span></body></html>