[maker-devel] Filtering of ab initio gene models

Fri Jun 6 10:33:06 MDT 2014

Another question: is there documentation anywhere for the naming
conventions of the genes annotated by Maker? Of course it's easy to spot
genes based on a particular *ab initio* gene predictor, as the names are in
the IDs. But what is the significance of, say,
"snap_masked-$seqid-processed-gene" in a gene ID vs
"maker-$seqid-snap-gene"?

Thanks,
Daniel

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University

On Thu, Jun 5, 2014 at 2:05 PM, Daniel Standage <daniel.standage at gmail.com>
wrote:

> I have attached data for a small 18kb region with a handful of genes, as
> well as the corresponding maker_opts.ctl file. (This is a smaller and
> different data set than what I was looking at yesterday, with a more
> well-defined problem).
>
> With the data files as is, Maker 2.31.3 reports a model from 4125 to 6400
> with an AED of 0.23. If you exclude transcript TSA024184, Maker reports a
> different gene from 6111 to 8345 with an AED of 0.01. Both of these genes
> have transcript support: will Maker report overlapping genes under any
> conditions? And even if Maker is forced to choose only a single gene to
> report, why would the model from 4125 to 6400 ever be reported in place of
> the one from 6111 to 8345, especially since this is provided in the
> model_gff file?
>
> Even when transcript TSA024184 is included, Maker 2.10 reports the
> high-confidence gene from 611 to 8345.
>
> Any light you could shed would be helpful. Thanks!
>
>
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
>
>
> On Wed, Jun 4, 2014 at 3:17 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> Just eAED, but eAED can affects selection of ab initio results.  For
>> example reading frame match of protein evidence which also affects whether
>> evidence from single_exon=1 and genes with single_exon protein evidence get
>> kept.  There is also the assumption that your alignments in GFF3 are are
>> correctly spliced (like BLAT does).  So giving blastn results as
>> precomputed est_gff would create a lot of noise, since maker ignores blastn
>> and is using it only to seed the polished exonerate alignments.
>>
>> --Carson
>>
>>
>> From: Daniel Standage <daniel.standage at gmail.com>
>> Date: Wednesday, June 4, 2014 at 1:11 PM
>> To: Carson Holt <carsonhh at gmail.com>
>> Cc: Maker Mailing List <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Filtering of ab initio gene models
>>
>> I do not provide Gap or Target attributes in the GFF3. Will this affect
>> the AED as well, or just the eAED?
>>
>>
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>>
>>
>> On Wed, Jun 4, 2014 at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>> Sure.  that would be helpful.  One question. Do you provide the Gap
>>> attribute in your precomputed alignments?  Having or not having that
>>> attribute affects the eAED score which takes reading frame into account,
>>> and may cause some things to be kept that normally would be dropped,
>>> because MAKER won't be able to take the points of mismatch of the alignment
>>> into account (it just assumes match everywhere).
>>>
>>> --Carson
>>>
>>>
>>> From: Daniel Standage <daniel.standage at gmail.com>
>>> Date: Wednesday, June 4, 2014 at 1:03 PM
>>> To: Maker Mailing List <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Filtering of ab initio gene models
>>>
>>> Thanks everyone for your responses recently!
>>>
>>> The reason for my recent flurry of email activity is that I'm seeing
>>> some unexpected trends when running the new version of Maker with
>>> precomputed alignments. Compared with an annotation I did a while ago
>>> (Maker 2.10, Maker-computed alignments), this new annotation has a
>>> substantial number of new genes annotated. If I compare distributions of
>>> AED scores between the old and new annotation, it's clear that the new
>>> annotation has a lot more low-quality models. If I look at new gene models
>>> that do not overlap with any gene model from the old annotation, the
>>> likelihood that it's a low-quality model is much higher.
>>>
>>> I decided to run a little experiment. I annotated a scaffold first using
>>> Maker 2.10 and then using Maker 2.31.3. I both cases, I used the same
>>> pre-computed transcript and protein alignments and the same (latest)
>>> version of SNAP as the only *ab initio* predictor. Maker 2.10 predicted
>>> 44 genes while Maker 2.31.3 predicted 63. If we group gene models into loci
>>> by overlap, there are 33 loci with gene models from both 2.10 and 2.31.3, 1
>>> locus with only models from 2.10, and 28 loci with only models from 2.31.3.
>>>
>>> Before this experiment, I assumed the issue was related to providing
>>> pre-computed alignments in GFF3 format and perhaps violating some important
>>> assumption. However, this experiment makes me wonder whether there have
>>> been changes to how Maker filters *ab initio* gene models between
>>> version 2.10 and version 2.31.3? Do you have any ideas? If it would help, I
>>> could put together a small data set that reproduces the behavior I just
>>> described.
>>>
>>> Thanks!
>>>
>>> --
>>> Daniel S. Standage
>>> Ph.D. Candidate
>>> Computational Genome Science Laboratory
>>> Indiana University
>>>  _______________________________________________ maker-devel mailing
>>> list maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140606/80a99b5d/attachment-0003.html>