[maker-devel] Filtering of ab initio gene models

Wed Jun 4 13:17:34 MDT 2014

Just eAED, but eAED can affects selection of ab initio results.  For example
reading frame match of protein evidence which also affects whether evidence
from single_exon=1 and genes with single_exon protein evidence get kept.
There is also the assumption that your alignments in GFF3 are are correctly
spliced (like BLAT does).  So giving blastn results as precomputed est_gff
would create a lot of noise, since maker ignores blastn and is using it only
to seed the polished exonerate alignments.

--Carson

From:  Daniel Standage <daniel.standage at gmail.com>
Date:  Wednesday, June 4, 2014 at 1:11 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Maker Mailing List <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Filtering of ab initio gene models

I do not provide Gap or Target attributes in the GFF3. Will this affect the
AED as well, or just the eAED?

--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University

On Wed, Jun 4, 2014 at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Sure.  that would be helpful.  One question. Do you provide the Gap attribute
> in your precomputed alignments?  Having or not having that attribute affects
> the eAED score which takes reading frame into account, and may cause some
> things to be kept that normally would be dropped, because MAKER won't be able
> to take the points of mismatch of the alignment into account (it just assumes
> match everywhere).
> 
> --Carson
> 
> 
> From:  Daniel Standage <daniel.standage at gmail.com>
> Date:  Wednesday, June 4, 2014 at 1:03 PM
> To:  Maker Mailing List <maker-devel at yandell-lab.org>
> Subject:  [maker-devel] Filtering of ab initio gene models
> 
> Thanks everyone for your responses recently!
> 
> The reason for my recent flurry of email activity is that I'm seeing some
> unexpected trends when running the new version of Maker with precomputed
> alignments. Compared with an annotation I did a while ago (Maker 2.10,
> Maker-computed alignments), this new annotation has a substantial number of
> new genes annotated. If I compare distributions of AED scores between the old
> and new annotation, it's clear that the new annotation has a lot more
> low-quality models. If I look at new gene models that do not overlap with any
> gene model from the old annotation, the likelihood that it's a low-quality
> model is much higher.
> 
> I decided to run a little experiment. I annotated a scaffold first using Maker
> 2.10 and then using Maker 2.31.3. I both cases, I used the same pre-computed
> transcript and protein alignments and the same (latest) version of SNAP as the
> only ab initio predictor. Maker 2.10 predicted 44 genes while Maker 2.31.3
> predicted 63. If we group gene models into loci by overlap, there are 33 loci
> with gene models from both 2.10 and 2.31.3, 1 locus with only models from
> 2.10, and 28 loci with only models from 2.31.3.
> 
> Before this experiment, I assumed the issue was related to providing
> pre-computed alignments in GFF3 format and perhaps violating some important
> assumption. However, this experiment makes me wonder whether there have been
> changes to how Maker filters ab initio gene models between version 2.10 and
> version 2.31.3? Do you have any ideas? If it would help, I could put together
> a small data set that reproduces the behavior I just described.
> 
> Thanks!
> 
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140604/afc14869/attachment-0003.html>