<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hello Barry and Carson,<br>

      <br>

      Thank you very much for the extensive replies!! Very helpful!!<br>

      <br>

      <blockquote type="cite">

        <div><br>

          2. I tried to use EST data of an alternative organism in

          altest= (#EST/cDNA sequence file in fasta format from an

          alternate organism). The organism is quite distantly related,

          but its the closest I have so I thought I d give it a shot. I

          ran maker twice with identical settigs expect in altest and

          est2genome=0/1. The number of genes predicted is identical

          with both approaches, so I am not sure whether or not the EST

          data was actually used or its just to distantly related. Any

          easy way to assess this?<br>

        </div>

      </blockquote>

      <div><br>

      </div>

      <div>> Typically EST evidence from another organism with

        alt_est will add little in the way of additional support

        (compared to just using protein evidence from say Swiss-prot)

        and this would be especially true if your alt_est is <br>

        > distantly related.  I'm not sure I really understand you

        alt_est/est2genome combo's to comment in more detail.  I could

        see four possible combinations there: which two gave identical

        results?</div>

      <br>

      What I meant was that I ran maker once without any alt_est

      evidence and est2genome=0 and a second time with

      alt_est=some.fasta and est2genome=1. The result was the same.

      Sorry, for not making myself clear enough. I thought that the

      est2genome=1 switch is is just enabling physical est evidence to

      be used. Therefore, I thought neither alt_est=some.fasta,

      est2genome=0 nor alt_est=nothing, est2genome=1 would make any

      sense. I had misunderstood this.  <br>

      <br>

      Will follow Carsons advice and will try to use more protein

      evidence from related species (in addition to uniprot). Running

      right now - Let s see where that leaves me. The IPRScan approach

      suggested by Barry to assess gene models without physical evidence

      sounds very interesting. I will definitely look into that. <br>

      <br>

      A question concerning an issue I just discovered:<br>

      Ran maker twice with the same physical evidence. First time using

      SNAP and Genemark, second time using SNAP, Genemark and AUGUSTUS

      (set to the closest related species available - same phylum,

      different class). Second run gives less gene models. IN another

      context I found that the second pass of Maker using SNAP and

      Genemark (after training SNAP on the predictions of the first

      Pass) and the same physical evidence yields less gene annotations.

      How can that be given the same physical evidence?<br>

      <br>

      Thanks again for your help! It is much appreciated!<br>

      <br>

      cheers,<br>

      Christoph<br>

      <br>

      Am 31.08.2012 21:03, schrieb Carson Holt:<br>

    </div>

    <blockquote cite="mid:CC667EFB.117DD%25carsonhh@gmail.com"

      type="cite">

      <div>I concur with everything Barry said.  Also let me add that

        alt-ESTs do not get polished around splice sites (exonerate

        won't handle them).  However ESTs and proteins do.  This means

        that the utility of alt-ESTs in adding UTR, and splice

        information is zero.  They just function as an anchor to

        maintain gene models that might have otherwise been rejected.

         This also means alt_est=some.fasta  together with est2genome=1

        will produce virtually no additional results because est2genome

        requires that the splice site makes sense.  You are better off

        using proteins with protein2genome=1 if you don’t have ESTs and

        want partial models for training.  Once you have a trained ab

        initio gene predictor, turn the est2genome and protein2genome

        options off.  Otherwise they will give weird partial models that

        decrease the quality of your final annotations. (partial models

        are ok for training but that's about it).</div>

      <div><br>

      </div>

      <div>If you are getting too low a gene count with keep_preds=0,

        then you probably need to add more evidence.  Try adding all

        proteins from a couple of related species (the protein= option

        accepts comma separated lists of files). If your genome is a

        fungi, oomycete, or a prokaryote, then setting keep_preds=1 is

        usually safe.  Theses are genomes with high gene density and

        simple gene structure, so ab initio predictors do really well

        and don't need as much help from the evidence.  For other

        organisms, leave it set to 0 or you will get a lot of false

        positives (false positives on some genomes with complex gene

        structure can outnumber the genes by 10 to 1 if you turn this

        on).</div>

      <div><br>

      </div>

      <div>Thanks,</div>

      <div>Carson</div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div><br>

      </div>

      <span id="OLK_SRC_BODY_SECTION">

        <div style="font-family:Calibri; font-size:11pt;

          text-align:left; color:black; BORDER-BOTTOM: medium none;

          BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT:

          0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;

          BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span

            style="font-weight:bold">From: </span> Barry Moore <<a

            moz-do-not-send="true"

            href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a>><br>

          <span style="font-weight:bold">Date: </span> Friday, 31

          August, 2012 12:52 PM<br>

          <span style="font-weight:bold">To: </span> Christoph Hahn

          <<a moz-do-not-send="true"

            href="mailto:chrisi.hahni@gmail.com">chrisi.hahni@gmail.com</a>><br>

          <span style="font-weight:bold">Cc: </span> <<a

            moz-do-not-send="true"

            href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br>

          <span style="font-weight:bold">Subject: </span> Re:

          [maker-devel] keep_preds option?<br>

        </div>

        <div><br>

        </div>

        <div>

          <div style="word-wrap: break-word; -webkit-nbsp-mode: space;

            -webkit-line-break: after-white-space; ">Hi Christopher,

            <div><br>

            </div>

            <div>Comments below:</div>

            <div><br>

              <div>

                <div>On Aug 31, 2012, at 6:43 AM, Christoph Hahn wrote:</div>

                <br class="Apple-interchange-newline">

                <blockquote type="cite">

                  <div>Hello maker users and developers,<br>

                    <br>

                    I am new to gene prediction and I am trying to use

                    maker 2.25 on a newly assembled non-model organisms

                    draft genome. Within maker I use genemark, SNAP and

                    Augustus. I have a few questions:<br>

                    <br>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>Welcome!</div>

                <br>

                <blockquote type="cite">

                  <div>1. I was wondering what the keep_preds option

                    means exactly.<br>

                    <br>

                    I found two slightly different explanations on the

                    option<br>

                    #Add unsupported gene prediction to final annotation

                    set (maker2.25)<br>

                    #Add non-overlapping ab-inito gene prediction to

                    final annotation set (found on the net - probably

                    older maker version)<br>

                    <br>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>It means to keep ab initio gene predictions for

                  which there is no physical evidence.  There are two

                  pieces of information that are required for every

                  MAKER annotation (by default).  MAKER runs the ab

                  initio gene predictors and aligns transcript

                  (EST/cDNA/mRNASeq transcripts) and protein sequences

                  to the genome.  For each locus where one or more gene

                  predictions exist MAKER checks to see if there is any

                  physical evidence for gene expression at that locus

                  (RNA/protein sequence alignments) and if there is

                  (splice EST or protein alignments) evidence

                  overlapping the predictions, MAKER decides which

                  prediction best matches the evidence and promotes that

                  prediction to an annotation.  If there is no evidence

                  overlapping any of the predictions then those

                  predictions are not included in the output annotation

                  file (although they are saved).  If you set

                  keep_preds=1 then for each locus where prediction(s)

                  exist maker keeps one and promotes it to an annotation

                  even though there is no physical evidence.  The

                  description of 'non-overlapping ab-initio'  means that

                  MAKER has clustered all ab-initio predictions at one

                  locus and chose one representative transcript to

                  output.</div>

                <br>

                <blockquote type="cite">

                  <div>As far as I understood keep_preds=0 only retains

                    gene models for which the ab initio predictions

                    agree. But how many, all three? two of three?<br>

                    keep_preds=1 instead keeps all gene models

                    regardless if the different programs agree, right?<br>

                    <br>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>MAKER does not take the presence of multiple ab

                  initio predictions as evidence and thus in the absence

                  of aligned physical evidence MAKER will not output an

                  annotation even if all three ab initio tools predict a

                  gene at that locus.</div>

                <br>

                <blockquote type="cite">

                  <div>In my case I get substantial differences in the

                    number of gene models found between the two

                    settings, while with =1 I get a number that is close

                    to what we would expect. How would you interpret

                    that? What would you recommend me to do? Obiously =0

                    is the saver option.<br>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>If you think that the number of genes you are

                  getting from a MAKER run is too few, you could run

                  MAKER with keep_preds=1.  After the run is finished,

                  use a tool like IPRScan to search all MAKER

                  predictions for protein domain content and push that

                  IPRScan output back into the MAKER GFF file with the

                  ipr_update_gff script.  Then as a final step you can

                  run over the GFF file and remove any gene model that

                  doesn't have either physical evidence (AED < 1) or

                  protein domain content (Dbxref=PFAM:XXX etc…) sorry

                  there's not a script prepackaged with MAKER for that

                  yet.</div>

                <div><br>

                </div>

                <blockquote type="cite">

                  <div><br>

                    2. I tried to use EST data of an alternative

                    organism in altest= (#EST/cDNA sequence file in

                    fasta format from an alternate organism). The

                    organism is quite distantly related, but its the

                    closest I have so I thought I d give it a shot. I

                    ran maker twice with identical settigs expect in

                    altest and est2genome=0/1. The number of genes

                    predicted is identical with both approaches, so I am

                    not sure whether or not the EST data was actually

                    used or its just to distantly related. Any easy way

                    to assess this?<br>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>Typically EST evidence from another organism with

                  alt_est will add little in the way of additional

                  support (compared to just using protein evidence from

                  say Swiss-prot) and this would be especially true if

                  your alt_est is distantly related.  I'm not sure I

                  really understand you alt_est/est2genome combo's to

                  comment in more detail.  I could see four possible

                  combinations there: which two gave identical results?</div>

                <br>

                <blockquote type="cite">

                  <div><br>

                    3. I am running maker in several passes and after

                    each pass I am training SNAP using the result of the

                    previous pass. Then for every pass I run maker from

                    scratch. Would you recommend to supply the gff of

                    the previous pass in "#-----Re-annotation Using

                    MAKER Derived GFF3<br>

                    maker_gff= #re-annotate genome based on this gff3

                    file", instead?<br>

                    <br>

                  </div>

                </blockquote>

                <div><br>

                </div>

                <div>No, 'Re-annotation using MAKER Derived GFF3' is

                  used for re-annotation of a genome when you want

                  certain parts of the previous run to be passed through

                  unchanged, but with retraining SNAP you want MAKER to

                  re-evaluate each annotation in light of the new

                  predictions made by the retrained SNAP.  MAKER should

                  run really fast in all of the runs after the first one

                  because as long as you haven't changed the evidence

                  files it won't have to redo any of the alignments.</div>

                <div><br>

                </div>

                <div><br>

                </div>

                B</div>

              <div><br>

                <blockquote type="cite">

                  <div>Thanks in advance for any thoughts/advice on

                    these things! I appreciate your help!<br>

                    <br>

                    much obliged,<br>

                    Christoph<br>

                    <br>

                    _______________________________________________<br>

                    maker-devel mailing list<br>

                    <a moz-do-not-send="true"

                      href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br>

                    <a moz-do-not-send="true"

href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br>

                  </div>

                </blockquote>

              </div>

              <br>

              <div><span class="Apple-style-span"

                  style="border-collapse: separate; color: rgb(0, 0, 0);

                  font-family: Helvetica; font-size: medium; font-style:

                  normal; font-variant: normal; font-weight: normal;

                  letter-spacing: normal; line-height: normal; orphans:

                  2; text-align: auto; text-indent: 0px; text-transform:

                  none; white-space: normal; widows: 2; word-spacing:

                  0px; -webkit-border-horizontal-spacing: 0px;

                  -webkit-border-vertical-spacing: 0px;

                  -webkit-text-decorations-in-effect: none;

                  -webkit-text-size-adjust: auto;

                  -webkit-text-stroke-width: 0px; ">

                  <div><span class="Apple-style-span"

                      style="font-family: Arial; font-size: 12px; ">

                      <div>Barry Moore</div>

                      <div>Research Scientist</div>

                      <div>Dept. of Human Genetics</div>

                      <div>University of Utah</div>

                      <div>Salt Lake City, UT 84112</div>

                      <div>--------------------------------------------</div>

                      <div>(801) 585-3543</div>

                      <div><br class="khtml-block-placeholder">

                      </div>

                    </span></div>

                  <div><br>

                  </div>

                </span><br class="Apple-interchange-newline">

              </div>

              <br>

            </div>

          </div>

        </div>

        _______________________________________________ maker-devel

        mailing list <a moz-do-not-send="true"

          href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a>

        <a moz-do-not-send="true"

href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>

      </span> </blockquote>

    <br>

    <br>

  </body>

</html>