[maker-devel] Filtering of ab initio gene models

Fri Jun 6 10:46:41 MDT 2014

Good to know, thanks. If multiple *ab initio* predictors inform a single
annotation, how does Maker decide which one will be included in the gene's
ID?

Given your quick response just now, I wanted to confirm that you got the
message and data set I sent yesterday. I received an email saying the size
of my message required list admin approval to be distributed, but since you
were also a direct recipient of the email I didn't worry about it too much.

Thanks again!

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University

On Fri, Jun 6, 2014 at 12:39 PM, Carson Holt <carsonhh at gmail.com> wrote:

> snap_masked-$seqid-processed-gene was produced by SNAP on the repeat
> masked sequence without hints (i.e. the ab initio call).
> maker-$seqid-snap-gene was produced by SNAP after receiving hints from
> MAKER.
>
> In both cases MAKER is allowed to add UTR to the model (hence the
> 'processed' tag).
>
> --Carson
>
>
> From: Daniel Standage <daniel.standage at gmail.com>
> Date: Friday, June 6, 2014 at 10:33 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Maker Mailing List <maker-devel at yandell-lab.org>, Volker Brendel <
> vbrendel at indiana.edu>
>
> Subject: Re: [maker-devel] Filtering of ab initio gene models
>
> Another question: is there documentation anywhere for the naming
> conventions of the genes annotated by Maker? Of course it's easy to spot
> genes based on a particular *ab initio* gene predictor, as the names are
> in the IDs. But what is the significance of, say,
> "snap_masked-$seqid-processed-gene" in a gene ID vs
> "maker-$seqid-snap-gene"?
>
> Thanks,
> Daniel
>
>
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
>
>
> On Thu, Jun 5, 2014 at 2:05 PM, Daniel Standage <daniel.standage at gmail.com
> > wrote:
>
>> I have attached data for a small 18kb region with a handful of genes, as
>> well as the corresponding maker_opts.ctl file. (This is a smaller and
>> different data set than what I was looking at yesterday, with a more
>> well-defined problem).
>>
>> With the data files as is, Maker 2.31.3 reports a model from 4125 to 6400
>> with an AED of 0.23. If you exclude transcript TSA024184, Maker reports a
>> different gene from 6111 to 8345 with an AED of 0.01. Both of these genes
>> have transcript support: will Maker report overlapping genes under any
>> conditions? And even if Maker is forced to choose only a single gene to
>> report, why would the model from 4125 to 6400 ever be reported in place of
>> the one from 6111 to 8345, especially since this is provided in the
>> model_gff file?
>>
>> Even when transcript TSA024184 is included, Maker 2.10 reports the
>> high-confidence gene from 611 to 8345.
>>
>> Any light you could shed would be helpful. Thanks!
>>
>>
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>>
>>
>> On Wed, Jun 4, 2014 at 3:17 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>> Just eAED, but eAED can affects selection of ab initio results.  For
>>> example reading frame match of protein evidence which also affects whether
>>> evidence from single_exon=1 and genes with single_exon protein evidence get
>>> kept.  There is also the assumption that your alignments in GFF3 are are
>>> correctly spliced (like BLAT does).  So giving blastn results as
>>> precomputed est_gff would create a lot of noise, since maker ignores blastn
>>> and is using it only to seed the polished exonerate alignments.
>>>
>>> --Carson
>>>
>>>
>>> From: Daniel Standage <daniel.standage at gmail.com>
>>> Date: Wednesday, June 4, 2014 at 1:11 PM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: Maker Mailing List <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] Filtering of ab initio gene models
>>>
>>> I do not provide Gap or Target attributes in the GFF3. Will this affect
>>> the AED as well, or just the eAED?
>>>
>>>
>>> --
>>> Daniel S. Standage
>>> Ph.D. Candidate
>>> Computational Genome Science Laboratory
>>> Indiana University
>>>
>>>
>>> On Wed, Jun 4, 2014 at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>
>>>> Sure.  that would be helpful.  One question. Do you provide the Gap
>>>> attribute in your precomputed alignments?  Having or not having that
>>>> attribute affects the eAED score which takes reading frame into account,
>>>> and may cause some things to be kept that normally would be dropped,
>>>> because MAKER won't be able to take the points of mismatch of the alignment
>>>> into account (it just assumes match everywhere).
>>>>
>>>> --Carson
>>>>
>>>>
>>>> From: Daniel Standage <daniel.standage at gmail.com>
>>>> Date: Wednesday, June 4, 2014 at 1:03 PM
>>>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>>>> Subject: [maker-devel] Filtering of ab initio gene models
>>>>
>>>> Thanks everyone for your responses recently!
>>>>
>>>> The reason for my recent flurry of email activity is that I'm seeing
>>>> some unexpected trends when running the new version of Maker with
>>>> precomputed alignments. Compared with an annotation I did a while ago
>>>> (Maker 2.10, Maker-computed alignments), this new annotation has a
>>>> substantial number of new genes annotated. If I compare distributions of
>>>> AED scores between the old and new annotation, it's clear that the new
>>>> annotation has a lot more low-quality models. If I look at new gene models
>>>> that do not overlap with any gene model from the old annotation, the
>>>> likelihood that it's a low-quality model is much higher.
>>>>
>>>> I decided to run a little experiment. I annotated a scaffold first
>>>> using Maker 2.10 and then using Maker 2.31.3. I both cases, I used the same
>>>> pre-computed transcript and protein alignments and the same (latest)
>>>> version of SNAP as the only *ab initio* predictor. Maker 2.10
>>>> predicted 44 genes while Maker 2.31.3 predicted 63. If we group gene models
>>>> into loci by overlap, there are 33 loci with gene models from both 2.10 and
>>>> 2.31.3, 1 locus with only models from 2.10, and 28 loci with only models
>>>> from 2.31.3.
>>>>
>>>> Before this experiment, I assumed the issue was related to providing
>>>> pre-computed alignments in GFF3 format and perhaps violating some important
>>>> assumption. However, this experiment makes me wonder whether there have
>>>> been changes to how Maker filters *ab initio* gene models between
>>>> version 2.10 and version 2.31.3? Do you have any ideas? If it would help, I
>>>> could put together a small data set that reproduces the behavior I just
>>>> described.
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> Daniel S. Standage
>>>> Ph.D. Candidate
>>>> Computational Genome Science Laboratory
>>>> Indiana University
>>>>  _______________________________________________ maker-devel mailing
>>>> list maker-devel at box290.bluehost.com
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140606/b717e0bb/attachment-0003.html>