[maker-devel] Filtering of ab initio gene models

Daniel Standage daniel.standage at gmail.com
Fri Jun 6 10:59:16 MDT 2014


This helps, thanks.


--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University


On Fri, Jun 6, 2014 at 12:56 PM, Carson Holt <carsonhh at gmail.com> wrote:

> I got the e-mail.  Thanks for the test set.
>
> Multiple *ab initio* predictors don't inform a single annotation, rather
> one must be chosen from the pool of available models (I.e. it has to be
> SNAP or Augustus, or GeneMark).  They all supply their own *ab initio* as
> well as hint based prediction, and then the one with best evidence match
> (measured by AED) is kept (it's like a competition that only one predictor
> can win).
>
> If you want a consensus model instead, you can take MAKER results in GFF3
> format and give them to Evidence Modeler (EVM).  The upcoming MAKER 3.0 is
> a collaboration with the EVM group and will have this option, but for now
> users can just split the MAKER GFF3 by evidence types and give it to EVM.
>  EVM then produces consensus models based on the GFF3 content.
>
> --Carson
>
> From: Daniel Standage <daniel.standage at gmail.com>
> Date: Friday, June 6, 2014 at 10:46 AM
>
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Maker Mailing List <maker-devel at yandell-lab.org>, Volker Brendel <
> vbrendel at indiana.edu>
> Subject: Re: [maker-devel] Filtering of ab initio gene models
>
> Good to know, thanks. If multiple *ab initio* predictors inform a single
> annotation, how does Maker decide which one will be included in the gene's
> ID?
>
> Given your quick response just now, I wanted to confirm that you got the
> message and data set I sent yesterday. I received an email saying the size
> of my message required list admin approval to be distributed, but since you
> were also a direct recipient of the email I didn't worry about it too much.
>
> Thanks again!
>
>
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
>
>
> On Fri, Jun 6, 2014 at 12:39 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> snap_masked-$seqid-processed-gene was produced by SNAP on the repeat
>> masked sequence without hints (i.e. the ab initio call).
>> maker-$seqid-snap-gene was produced by SNAP after receiving hints from
>> MAKER.
>>
>> In both cases MAKER is allowed to add UTR to the model (hence the
>> 'processed' tag).
>>
>> --Carson
>>
>>
>> From: Daniel Standage <daniel.standage at gmail.com>
>> Date: Friday, June 6, 2014 at 10:33 AM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Maker Mailing List <maker-devel at yandell-lab.org>, Volker Brendel <
>> vbrendel at indiana.edu>
>>
>> Subject: Re: [maker-devel] Filtering of ab initio gene models
>>
>> Another question: is there documentation anywhere for the naming
>> conventions of the genes annotated by Maker? Of course it's easy to spot
>> genes based on a particular *ab initio* gene predictor, as the names are
>> in the IDs. But what is the significance of, say,
>> "snap_masked-$seqid-processed-gene" in a gene ID vs
>> "maker-$seqid-snap-gene"?
>>
>> Thanks,
>> Daniel
>>
>>
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>>
>>
>> On Thu, Jun 5, 2014 at 2:05 PM, Daniel Standage <
>> daniel.standage at gmail.com> wrote:
>>
>>> I have attached data for a small 18kb region with a handful of genes, as
>>> well as the corresponding maker_opts.ctl file. (This is a smaller and
>>> different data set than what I was looking at yesterday, with a more
>>> well-defined problem).
>>>
>>> With the data files as is, Maker 2.31.3 reports a model from 4125 to
>>> 6400 with an AED of 0.23. If you exclude transcript TSA024184, Maker
>>> reports a different gene from 6111 to 8345 with an AED of 0.01. Both of
>>> these genes have transcript support: will Maker report overlapping genes
>>> under any conditions? And even if Maker is forced to choose only a single
>>> gene to report, why would the model from 4125 to 6400 ever be reported in
>>> place of the one from 6111 to 8345, especially since this is provided in
>>> the model_gff file?
>>>
>>> Even when transcript TSA024184 is included, Maker 2.10 reports the
>>> high-confidence gene from 611 to 8345.
>>>
>>> Any light you could shed would be helpful. Thanks!
>>>
>>>
>>> --
>>> Daniel S. Standage
>>> Ph.D. Candidate
>>> Computational Genome Science Laboratory
>>> Indiana University
>>>
>>>
>>> On Wed, Jun 4, 2014 at 3:17 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>
>>>> Just eAED, but eAED can affects selection of ab initio results.  For
>>>> example reading frame match of protein evidence which also affects whether
>>>> evidence from single_exon=1 and genes with single_exon protein evidence get
>>>> kept.  There is also the assumption that your alignments in GFF3 are are
>>>> correctly spliced (like BLAT does).  So giving blastn results as
>>>> precomputed est_gff would create a lot of noise, since maker ignores blastn
>>>> and is using it only to seed the polished exonerate alignments.
>>>>
>>>> --Carson
>>>>
>>>>
>>>> From: Daniel Standage <daniel.standage at gmail.com>
>>>> Date: Wednesday, June 4, 2014 at 1:11 PM
>>>> To: Carson Holt <carsonhh at gmail.com>
>>>> Cc: Maker Mailing List <maker-devel at yandell-lab.org>
>>>> Subject: Re: [maker-devel] Filtering of ab initio gene models
>>>>
>>>> I do not provide Gap or Target attributes in the GFF3. Will this affect
>>>> the AED as well, or just the eAED?
>>>>
>>>>
>>>> --
>>>> Daniel S. Standage
>>>> Ph.D. Candidate
>>>> Computational Genome Science Laboratory
>>>> Indiana University
>>>>
>>>>
>>>> On Wed, Jun 4, 2014 at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>
>>>>> Sure.  that would be helpful.  One question. Do you provide the Gap
>>>>> attribute in your precomputed alignments?  Having or not having that
>>>>> attribute affects the eAED score which takes reading frame into account,
>>>>> and may cause some things to be kept that normally would be dropped,
>>>>> because MAKER won't be able to take the points of mismatch of the alignment
>>>>> into account (it just assumes match everywhere).
>>>>>
>>>>> --Carson
>>>>>
>>>>>
>>>>> From: Daniel Standage <daniel.standage at gmail.com>
>>>>> Date: Wednesday, June 4, 2014 at 1:03 PM
>>>>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>>>>> Subject: [maker-devel] Filtering of ab initio gene models
>>>>>
>>>>> Thanks everyone for your responses recently!
>>>>>
>>>>> The reason for my recent flurry of email activity is that I'm seeing
>>>>> some unexpected trends when running the new version of Maker with
>>>>> precomputed alignments. Compared with an annotation I did a while ago
>>>>> (Maker 2.10, Maker-computed alignments), this new annotation has a
>>>>> substantial number of new genes annotated. If I compare distributions of
>>>>> AED scores between the old and new annotation, it's clear that the new
>>>>> annotation has a lot more low-quality models. If I look at new gene models
>>>>> that do not overlap with any gene model from the old annotation, the
>>>>> likelihood that it's a low-quality model is much higher.
>>>>>
>>>>> I decided to run a little experiment. I annotated a scaffold first
>>>>> using Maker 2.10 and then using Maker 2.31.3. I both cases, I used the same
>>>>> pre-computed transcript and protein alignments and the same (latest)
>>>>> version of SNAP as the only *ab initio* predictor. Maker 2.10
>>>>> predicted 44 genes while Maker 2.31.3 predicted 63. If we group gene models
>>>>> into loci by overlap, there are 33 loci with gene models from both 2.10 and
>>>>> 2.31.3, 1 locus with only models from 2.10, and 28 loci with only models
>>>>> from 2.31.3.
>>>>>
>>>>> Before this experiment, I assumed the issue was related to providing
>>>>> pre-computed alignments in GFF3 format and perhaps violating some important
>>>>> assumption. However, this experiment makes me wonder whether there have
>>>>> been changes to how Maker filters *ab initio* gene models between
>>>>> version 2.10 and version 2.31.3? Do you have any ideas? If it would help, I
>>>>> could put together a small data set that reproduces the behavior I just
>>>>> described.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>> Daniel S. Standage
>>>>> Ph.D. Candidate
>>>>> Computational Genome Science Laboratory
>>>>> Indiana University
>>>>>  _______________________________________________ maker-devel mailing
>>>>> list maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140606/591c96ea/attachment-0003.html>


More information about the maker-devel mailing list