<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hello Barry and Carson,<br>
<br>
Thank you very much for the extensive replies!! Very helpful!!<br>
<br>
<blockquote type="cite">
<div><br>
2. I tried to use EST data of an alternative organism in
altest= (#EST/cDNA sequence file in fasta format from an
alternate organism). The organism is quite distantly related,
but its the closest I have so I thought I d give it a shot. I
ran maker twice with identical settigs expect in altest and
est2genome=0/1. The number of genes predicted is identical
with both approaches, so I am not sure whether or not the EST
data was actually used or its just to distantly related. Any
easy way to assess this?<br>
</div>
</blockquote>
<div><br>
</div>
<div>> Typically EST evidence from another organism with
alt_est will add little in the way of additional support
(compared to just using protein evidence from say Swiss-prot)
and this would be especially true if your alt_est is <br>
> distantly related. I'm not sure I really understand you
alt_est/est2genome combo's to comment in more detail. I could
see four possible combinations there: which two gave identical
results?</div>
<br>
What I meant was that I ran maker once without any alt_est
evidence and est2genome=0 and a second time with
alt_est=some.fasta and est2genome=1. The result was the same.
Sorry, for not making myself clear enough. I thought that the
est2genome=1 switch is is just enabling physical est evidence to
be used. Therefore, I thought neither alt_est=some.fasta,
est2genome=0 nor alt_est=nothing, est2genome=1 would make any
sense. I had misunderstood this. <br>
<br>
Will follow Carsons advice and will try to use more protein
evidence from related species (in addition to uniprot). Running
right now - Let s see where that leaves me. The IPRScan approach
suggested by Barry to assess gene models without physical evidence
sounds very interesting. I will definitely look into that. <br>
<br>
A question concerning an issue I just discovered:<br>
Ran maker twice with the same physical evidence. First time using
SNAP and Genemark, second time using SNAP, Genemark and AUGUSTUS
(set to the closest related species available - same phylum,
different class). Second run gives less gene models. IN another
context I found that the second pass of Maker using SNAP and
Genemark (after training SNAP on the predictions of the first
Pass) and the same physical evidence yields less gene annotations.
How can that be given the same physical evidence?<br>
<br>
Thanks again for your help! It is much appreciated!<br>
<br>
cheers,<br>
Christoph<br>
<br>
Am 31.08.2012 21:03, schrieb Carson Holt:<br>
</div>
<blockquote cite="mid:CC667EFB.117DD%25carsonhh@gmail.com"
type="cite">
<div>I concur with everything Barry said. Also let me add that
alt-ESTs do not get polished around splice sites (exonerate
won't handle them). However ESTs and proteins do. This means
that the utility of alt-ESTs in adding UTR, and splice
information is zero. They just function as an anchor to
maintain gene models that might have otherwise been rejected.
This also means alt_est=some.fasta together with est2genome=1
will produce virtually no additional results because est2genome
requires that the splice site makes sense. You are better off
using proteins with protein2genome=1 if you don’t have ESTs and
want partial models for training. Once you have a trained ab
initio gene predictor, turn the est2genome and protein2genome
options off. Otherwise they will give weird partial models that
decrease the quality of your final annotations. (partial models
are ok for training but that's about it).</div>
<div><br>
</div>
<div>If you are getting too low a gene count with keep_preds=0,
then you probably need to add more evidence. Try adding all
proteins from a couple of related species (the protein= option
accepts comma separated lists of files). If your genome is a
fungi, oomycete, or a prokaryote, then setting keep_preds=1 is
usually safe. Theses are genomes with high gene density and
simple gene structure, so ab initio predictors do really well
and don't need as much help from the evidence. For other
organisms, leave it set to 0 or you will get a lot of false
positives (false positives on some genomes with complex gene
structure can outnumber the genes by 10 to 1 if you turn this
on).</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Carson</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION">
<div style="font-family:Calibri; font-size:11pt;
text-align:left; color:black; BORDER-BOTTOM: medium none;
BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT:
0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;
BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span
style="font-weight:bold">From: </span> Barry Moore <<a
moz-do-not-send="true"
href="mailto:barry.moore@genetics.utah.edu">barry.moore@genetics.utah.edu</a>><br>
<span style="font-weight:bold">Date: </span> Friday, 31
August, 2012 12:52 PM<br>
<span style="font-weight:bold">To: </span> Christoph Hahn
<<a moz-do-not-send="true"
href="mailto:chrisi.hahni@gmail.com">chrisi.hahni@gmail.com</a>><br>
<span style="font-weight:bold">Cc: </span> <<a
moz-do-not-send="true"
href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>><br>
<span style="font-weight:bold">Subject: </span> Re:
[maker-devel] keep_preds option?<br>
</div>
<div><br>
</div>
<div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space;
-webkit-line-break: after-white-space; ">Hi Christopher,
<div><br>
</div>
<div>Comments below:</div>
<div><br>
<div>
<div>On Aug 31, 2012, at 6:43 AM, Christoph Hahn wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div>Hello maker users and developers,<br>
<br>
I am new to gene prediction and I am trying to use
maker 2.25 on a newly assembled non-model organisms
draft genome. Within maker I use genemark, SNAP and
Augustus. I have a few questions:<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>Welcome!</div>
<br>
<blockquote type="cite">
<div>1. I was wondering what the keep_preds option
means exactly.<br>
<br>
I found two slightly different explanations on the
option<br>
#Add unsupported gene prediction to final annotation
set (maker2.25)<br>
#Add non-overlapping ab-inito gene prediction to
final annotation set (found on the net - probably
older maker version)<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>It means to keep ab initio gene predictions for
which there is no physical evidence. There are two
pieces of information that are required for every
MAKER annotation (by default). MAKER runs the ab
initio gene predictors and aligns transcript
(EST/cDNA/mRNASeq transcripts) and protein sequences
to the genome. For each locus where one or more gene
predictions exist MAKER checks to see if there is any
physical evidence for gene expression at that locus
(RNA/protein sequence alignments) and if there is
(splice EST or protein alignments) evidence
overlapping the predictions, MAKER decides which
prediction best matches the evidence and promotes that
prediction to an annotation. If there is no evidence
overlapping any of the predictions then those
predictions are not included in the output annotation
file (although they are saved). If you set
keep_preds=1 then for each locus where prediction(s)
exist maker keeps one and promotes it to an annotation
even though there is no physical evidence. The
description of 'non-overlapping ab-initio' means that
MAKER has clustered all ab-initio predictions at one
locus and chose one representative transcript to
output.</div>
<br>
<blockquote type="cite">
<div>As far as I understood keep_preds=0 only retains
gene models for which the ab initio predictions
agree. But how many, all three? two of three?<br>
keep_preds=1 instead keeps all gene models
regardless if the different programs agree, right?<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>MAKER does not take the presence of multiple ab
initio predictions as evidence and thus in the absence
of aligned physical evidence MAKER will not output an
annotation even if all three ab initio tools predict a
gene at that locus.</div>
<br>
<blockquote type="cite">
<div>In my case I get substantial differences in the
number of gene models found between the two
settings, while with =1 I get a number that is close
to what we would expect. How would you interpret
that? What would you recommend me to do? Obiously =0
is the saver option.<br>
</div>
</blockquote>
<div><br>
</div>
<div>If you think that the number of genes you are
getting from a MAKER run is too few, you could run
MAKER with keep_preds=1. After the run is finished,
use a tool like IPRScan to search all MAKER
predictions for protein domain content and push that
IPRScan output back into the MAKER GFF file with the
ipr_update_gff script. Then as a final step you can
run over the GFF file and remove any gene model that
doesn't have either physical evidence (AED < 1) or
protein domain content (Dbxref=PFAM:XXX etc…) sorry
there's not a script prepackaged with MAKER for that
yet.</div>
<div><br>
</div>
<blockquote type="cite">
<div><br>
2. I tried to use EST data of an alternative
organism in altest= (#EST/cDNA sequence file in
fasta format from an alternate organism). The
organism is quite distantly related, but its the
closest I have so I thought I d give it a shot. I
ran maker twice with identical settigs expect in
altest and est2genome=0/1. The number of genes
predicted is identical with both approaches, so I am
not sure whether or not the EST data was actually
used or its just to distantly related. Any easy way
to assess this?<br>
</div>
</blockquote>
<div><br>
</div>
<div>Typically EST evidence from another organism with
alt_est will add little in the way of additional
support (compared to just using protein evidence from
say Swiss-prot) and this would be especially true if
your alt_est is distantly related. I'm not sure I
really understand you alt_est/est2genome combo's to
comment in more detail. I could see four possible
combinations there: which two gave identical results?</div>
<br>
<blockquote type="cite">
<div><br>
3. I am running maker in several passes and after
each pass I am training SNAP using the result of the
previous pass. Then for every pass I run maker from
scratch. Would you recommend to supply the gff of
the previous pass in "#-----Re-annotation Using
MAKER Derived GFF3<br>
maker_gff= #re-annotate genome based on this gff3
file", instead?<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>No, 'Re-annotation using MAKER Derived GFF3' is
used for re-annotation of a genome when you want
certain parts of the previous run to be passed through
unchanged, but with retraining SNAP you want MAKER to
re-evaluate each annotation in light of the new
predictions made by the retrained SNAP. MAKER should
run really fast in all of the runs after the first one
because as long as you haven't changed the evidence
files it won't have to redo any of the alignments.</div>
<div><br>
</div>
<div><br>
</div>
B</div>
<div><br>
<blockquote type="cite">
<div>Thanks in advance for any thoughts/advice on
these things! I appreciate your help!<br>
<br>
much obliged,<br>
Christoph<br>
<br>
_______________________________________________<br>
maker-devel mailing list<br>
<a moz-do-not-send="true"
href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a><br>
<a moz-do-not-send="true"
href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a><br>
</div>
</blockquote>
</div>
<br>
<div><span class="Apple-style-span"
style="border-collapse: separate; color: rgb(0, 0, 0);
font-family: Helvetica; font-size: medium; font-style:
normal; font-variant: normal; font-weight: normal;
letter-spacing: normal; line-height: normal; orphans:
2; text-align: auto; text-indent: 0px; text-transform:
none; white-space: normal; widows: 2; word-spacing:
0px; -webkit-border-horizontal-spacing: 0px;
-webkit-border-vertical-spacing: 0px;
-webkit-text-decorations-in-effect: none;
-webkit-text-size-adjust: auto;
-webkit-text-stroke-width: 0px; ">
<div><span class="Apple-style-span"
style="font-family: Arial; font-size: 12px; ">
<div>Barry Moore</div>
<div>Research Scientist</div>
<div>Dept. of Human Genetics</div>
<div>University of Utah</div>
<div>Salt Lake City, UT 84112</div>
<div>--------------------------------------------</div>
<div>(801) 585-3543</div>
<div><br class="khtml-block-placeholder">
</div>
</span></div>
<div><br>
</div>
</span><br class="Apple-interchange-newline">
</div>
<br>
</div>
</div>
</div>
_______________________________________________ maker-devel
mailing list <a moz-do-not-send="true"
href="mailto:maker-devel@box290.bluehost.com">maker-devel@box290.bluehost.com</a>
<a moz-do-not-send="true"
href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a>
</span> </blockquote>
<br>
<br>
</body>
</html>