<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Carson,<br>
is there a way of allowing MAKER to add UTRs to our external models
(supplied by the pred_gff or model_gff tag)? This seems to be one
problem we are running into. Our external models are high quality,
but CDS only. Thus their score gets knocked down relative to ab
initio predictions with added UTRs.<br>
<br>
Daniel will have more questions/observations later with regard to
overlapping gene models (we definitely need to allow gene models to
overlap in the UTRs, because transcript evidence clearly shows such
negative intergenic spaces).<br>
<br>
Thanks for all your help!<br>
Volker<br>
<br>
<div class="moz-cite-prefix">On 6/6/2014 11:39 AM, Carson Holt
wrote:<br>
</div>
<blockquote cite="mid:CFB749AE.CE82%25carsonhh@gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div>snap_masked-$seqid-processed-gene was produced by SNAP on the
repeat masked sequence without hints (i.e. the ab initio call).</div>
<div>maker-$seqid-snap-gene was produced by SNAP after receiving
hints from MAKER.</div>
<div><br>
</div>
<div>In both cases MAKER is allowed to add UTR to the model (hence
the 'processed' tag).</div>
<div><br>
</div>
<div>--Carson</div>
<div><br>
</div>
<div><br>
</div>
<span id="OLK_SRC_BODY_SECTION">
<div style="font-family:Calibri; font-size:11pt;
text-align:left; color:black; BORDER-BOTTOM: medium none;
BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT:
0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;
BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span
style="font-weight:bold">From: </span> Daniel Standage <<a
moz-do-not-send="true"
href="mailto:daniel.standage@gmail.com">daniel.standage@gmail.com</a>><br>
<span style="font-weight:bold">Date: </span> Friday, June 6,
2014 at 10:33 AM<br>
<span style="font-weight:bold">To: </span> Carson Holt <<a
moz-do-not-send="true" href="mailto:carsonhh@gmail.com">carsonhh@gmail.com</a>><br>
<span style="font-weight:bold">Cc: </span> Maker Mailing List
<<a moz-do-not-send="true"
href="mailto:maker-devel@yandell-lab.org">maker-devel@yandell-lab.org</a>>,
Volker Brendel <<a moz-do-not-send="true"
href="mailto:vbrendel@indiana.edu">vbrendel@indiana.edu</a>><br>
<span style="font-weight:bold">Subject: </span> Re:
[maker-devel] Filtering of ab initio gene models<br>
</div>
<div><br>
</div>
<div dir="ltr">
<div>
<div>Another question: is there documentation anywhere for
the naming conventions of the genes annotated by Maker? Of
course it's easy to spot genes based on a particular <i>ab
initio</i> gene predictor, as the names are in the IDs.
But what is the significance of, say,
"snap_masked-$seqid-processed-gene" in a gene ID vs
"maker-$seqid-snap-gene"?<br>
<br>
</div>
Thanks,<br>
</div>
Daniel<br>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div dir="ltr"><br>
--<br>
Daniel S. Standage<br>
Ph.D. Candidate<br>
Computational Genome Science Laboratory<br>
Indiana University<br>
</div>
</div>
<br>
<br>
<div class="gmail_quote">On Thu, Jun 5, 2014 at 2:05 PM,
Daniel Standage <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:daniel.standage@gmail.com" target="_blank">daniel.standage@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div>
<div>I have attached data for a small 18kb region with
a handful of genes, as well as the corresponding
maker_opts.ctl file. (This is a smaller and
different data set than what I was looking at
yesterday, with a more well-defined problem).<br>
<br>
With the data files as is, Maker 2.31.3 reports a
model from 4125 to 6400 with an AED of 0.23. If you
exclude transcript TSA024184, Maker reports a
different gene from 6111 to 8345 with an AED of
0.01. Both of these genes have transcript support:
will Maker report overlapping genes under any
conditions? And even if Maker is forced to choose
only a single gene to report, why would the model
from 4125 to 6400 ever be reported in place of the
one from 6111 to 8345, especially since this is
provided in the model_gff file?<br>
<br>
</div>
Even when transcript TSA024184 is included, Maker 2.10
reports the high-confidence gene from 611 to 8345.<br>
<br>
</div>
Any light you could shed would be helpful. Thanks!<br>
</div>
<div class="gmail_extra">
<div class=""><br clear="all">
<div>
<div dir="ltr"><br>
--<br>
Daniel S. Standage<br>
Ph.D. Candidate<br>
Computational Genome Science Laboratory<br>
Indiana University<br>
</div>
</div>
<br>
<br>
</div>
<div>
<div class="h5">
<div class="gmail_quote">On Wed, Jun 4, 2014 at 3:17
PM, Carson Holt <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:carsonhh@gmail.com"
target="_blank">carsonhh@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div
style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<div>Just eAED, but eAED can affects selection
of ab initio results. For example reading
frame match of protein evidence which also
affects whether evidence from single_exon=1
and genes with single_exon protein evidence
get kept. There is also the assumption that
your alignments in GFF3 are are correctly
spliced (like BLAT does). So giving blastn
results as precomputed est_gff would create
a lot of noise, since maker ignores blastn
and is using it only to seed the polished
exonerate alignments.</div>
<div><br>
</div>
<div>--Carson</div>
<div><br>
</div>
<div><br>
</div>
<span>
<div
style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium
none;BORDER-LEFT:medium
none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df
1pt solid;BORDER-RIGHT:medium
none;PADDING-TOP:3pt"><span
style="font-weight:bold">From: </span>
Daniel Standage <<a
moz-do-not-send="true"
href="mailto:daniel.standage@gmail.com"
target="_blank">daniel.standage@gmail.com</a>><br>
<span style="font-weight:bold">Date: </span>
Wednesday, June 4, 2014 at 1:11 PM<br>
<span style="font-weight:bold">To: </span>
Carson Holt <<a moz-do-not-send="true"
href="mailto:carsonhh@gmail.com"
target="_blank">carsonhh@gmail.com</a>><br>
<span style="font-weight:bold">Cc: </span>
Maker Mailing List <<a
moz-do-not-send="true"
href="mailto:maker-devel@yandell-lab.org"
target="_blank">maker-devel@yandell-lab.org</a>><br>
<span style="font-weight:bold">Subject: </span>
Re: [maker-devel] Filtering of ab initio
gene models<br>
</div>
<div>
<div>
<div><br>
</div>
<div dir="ltr">I do not provide Gap or
Target attributes in the GFF3. Will
this affect the AED as well, or just
the eAED?<br>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div dir="ltr"><br>
--<br>
Daniel S. Standage<br>
Ph.D. Candidate<br>
Computational Genome Science
Laboratory<br>
Indiana University<br>
</div>
</div>
<br>
<br>
<div class="gmail_quote">On Wed, Jun
4, 2014 at 3:09 PM, Carson Holt <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:carsonhh@gmail.com"
target="_blank">carsonhh@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div
style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<div>Sure. that would be
helpful. One question. Do you
provide the Gap attribute in
your precomputed alignments?
Having or not having that
attribute affects the eAED
score which takes reading
frame into account, and may
cause some things to be kept
that normally would be
dropped, because MAKER won't
be able to take the points of
mismatch of the alignment into
account (it just assumes match
everywhere).</div>
<div><br>
</div>
<div>--Carson</div>
<div><br>
</div>
<div><br>
</div>
<span>
<div
style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium
none;BORDER-LEFT:medium
none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df
1pt
solid;BORDER-RIGHT:medium
none;PADDING-TOP:3pt"><span
style="font-weight:bold">From:
</span> Daniel Standage <<a
moz-do-not-send="true"
href="mailto:daniel.standage@gmail.com"
target="_blank">daniel.standage@gmail.com</a>><br>
<span
style="font-weight:bold">Date:
</span> Wednesday, June 4,
2014 at 1:03 PM<br>
<span
style="font-weight:bold">To:
</span> Maker Mailing List
<<a
moz-do-not-send="true"
href="mailto:maker-devel@yandell-lab.org"
target="_blank">maker-devel@yandell-lab.org</a>><br>
<span
style="font-weight:bold">Subject:
</span> [maker-devel]
Filtering of ab initio gene
models<br>
</div>
<div>
<div>
<div><br>
</div>
<div dir="ltr">
<div>
<div>
<div>Thanks everyone
for your responses
recently!<br>
<br>
</div>
The reason for my
recent flurry of
email activity is
that I'm seeing some
unexpected trends
when running the new
version of Maker
with precomputed
alignments. Compared
with an annotation I
did a while ago
(Maker 2.10,
Maker-computed
alignments), this
new annotation has a
substantial number
of new genes
annotated. If I
compare
distributions of AED
scores between the
old and new
annotation, it's
clear that the new
annotation has a lot
more low-quality
models. If I look at
new gene models that
do not overlap with
any gene model from
the old annotation,
the likelihood that
it's a low-quality
model is much
higher.<br>
<br>
</div>
I decided to run a
little experiment. I
annotated a scaffold
first using Maker 2.10
and then using Maker
2.31.3. I both cases,
I used the same
pre-computed
transcript and protein
alignments and the
same (latest) version
of SNAP as the only <i>ab
initio</i>
predictor. Maker 2.10
predicted 44 genes
while Maker 2.31.3
predicted 63. If we
group gene models into
loci by overlap, there
are 33 loci with gene
models from both 2.10
and 2.31.3, 1 locus
with only models from
2.10, and 28 loci with
only models from
2.31.3.<br>
<br>
</div>
Before this experiment,
I assumed the issue was
related to providing
pre-computed alignments
in GFF3 format and
perhaps violating some
important assumption.
However, this experiment
makes me wonder whether
there have been changes
to how Maker filters <i>ab
initio</i> gene models
between version 2.10 and
version 2.31.3? Do you
have any ideas? If it
would help, I could put
together a small data
set that reproduces the
behavior I just
described.<br>
<br>
Thanks!<br clear="all">
<div>
<div>
<div>
<div>
<div>
<div dir="ltr"><br>
--<br>
Daniel S.
Standage<br>
Ph.D.
Candidate<br>
Computational
Genome Science
Laboratory<br>
Indiana
University<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
_______________________________________________
maker-devel
mailing list
<a moz-do-not-send="true"
href="mailto:maker-devel@box290.bluehost.com"
target="_blank">maker-devel@box290.bluehost.com</a><a
moz-do-not-send="true"
href="http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org"
target="_blank">http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org</a></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</span>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Volker Brendel
Professor of Biology and Computer Science
Indiana University
Department of Biology & School of Informatics and Computing
Simon Hall 205C
212 South Hawthorne Drive
Bloomington, IN 47405-7003
Tel.: (812) 855-7074
<a class="moz-txt-link-freetext" href="http://brendelgroup.org/">http://brendelgroup.org/</a>
</pre>
</body>
</html>