[maker-devel] Filtering of ab initio gene models

Carson Holt carsonhh at gmail.com
Fri Jun 6 10:39:43 MDT 2014


snap_masked-$seqid-processed-gene was produced by SNAP on the repeat masked
sequence without hints (i.e. the ab initio call).
maker-$seqid-snap-gene was produced by SNAP after receiving hints from
MAKER.

In both cases MAKER is allowed to add UTR to the model (hence the
'processed' tag).

--Carson


From:  Daniel Standage <daniel.standage at gmail.com>
Date:  Friday, June 6, 2014 at 10:33 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  Maker Mailing List <maker-devel at yandell-lab.org>, Volker Brendel
<vbrendel at indiana.edu>
Subject:  Re: [maker-devel] Filtering of ab initio gene models

Another question: is there documentation anywhere for the naming conventions
of the genes annotated by Maker? Of course it's easy to spot genes based on
a particular ab initio gene predictor, as the names are in the IDs. But what
is the significance of, say, "snap_masked-$seqid-processed-gene" in a gene
ID vs "maker-$seqid-snap-gene"?

Thanks,
Daniel


--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University


On Thu, Jun 5, 2014 at 2:05 PM, Daniel Standage <daniel.standage at gmail.com>
wrote:
> I have attached data for a small 18kb region with a handful of genes, as well
> as the corresponding maker_opts.ctl file. (This is a smaller and different
> data set than what I was looking at yesterday, with a more well-defined
> problem).
> 
> With the data files as is, Maker 2.31.3 reports a model from 4125 to 6400 with
> an AED of 0.23. If you exclude transcript TSA024184, Maker reports a different
> gene from 6111 to 8345 with an AED of 0.01. Both of these genes have
> transcript support: will Maker report overlapping genes under any conditions?
> And even if Maker is forced to choose only a single gene to report, why would
> the model from 4125 to 6400 ever be reported in place of the one from 6111 to
> 8345, especially since this is provided in the model_gff file?
> 
> Even when transcript TSA024184 is included, Maker 2.10 reports the
> high-confidence gene from 611 to 8345.
> 
> Any light you could shed would be helpful. Thanks!
> 
> 
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
> 
> 
> On Wed, Jun 4, 2014 at 3:17 PM, Carson Holt <carsonhh at gmail.com> wrote:
>> Just eAED, but eAED can affects selection of ab initio results.  For example
>> reading frame match of protein evidence which also affects whether evidence
>> from single_exon=1 and genes with single_exon protein evidence get kept.
>> There is also the assumption that your alignments in GFF3 are are correctly
>> spliced (like BLAT does).  So giving blastn results as precomputed est_gff
>> would create a lot of noise, since maker ignores blastn and is using it only
>> to seed the polished exonerate alignments.
>> 
>> --Carson
>> 
>> 
>> From:  Daniel Standage <daniel.standage at gmail.com>
>> Date:  Wednesday, June 4, 2014 at 1:11 PM
>> To:  Carson Holt <carsonhh at gmail.com>
>> Cc:  Maker Mailing List <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] Filtering of ab initio gene models
>> 
>> I do not provide Gap or Target attributes in the GFF3. Will this affect the
>> AED as well, or just the eAED?
>> 
>> 
>> --
>> Daniel S. Standage
>> Ph.D. Candidate
>> Computational Genome Science Laboratory
>> Indiana University
>> 
>> 
>> On Wed, Jun 4, 2014 at 3:09 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>> Sure.  that would be helpful.  One question. Do you provide the Gap
>>> attribute in your precomputed alignments?  Having or not having that
>>> attribute affects the eAED score which takes reading frame into account, and
>>> may cause some things to be kept that normally would be dropped, because
>>> MAKER won't be able to take the points of mismatch of the alignment into
>>> account (it just assumes match everywhere).
>>> 
>>> --Carson
>>> 
>>> 
>>> From:  Daniel Standage <daniel.standage at gmail.com>
>>> Date:  Wednesday, June 4, 2014 at 1:03 PM
>>> To:  Maker Mailing List <maker-devel at yandell-lab.org>
>>> Subject:  [maker-devel] Filtering of ab initio gene models
>>> 
>>> Thanks everyone for your responses recently!
>>> 
>>> The reason for my recent flurry of email activity is that I'm seeing some
>>> unexpected trends when running the new version of Maker with precomputed
>>> alignments. Compared with an annotation I did a while ago (Maker 2.10,
>>> Maker-computed alignments), this new annotation has a substantial number of
>>> new genes annotated. If I compare distributions of AED scores between the
>>> old and new annotation, it's clear that the new annotation has a lot more
>>> low-quality models. If I look at new gene models that do not overlap with
>>> any gene model from the old annotation, the likelihood that it's a
>>> low-quality model is much higher.
>>> 
>>> I decided to run a little experiment. I annotated a scaffold first using
>>> Maker 2.10 and then using Maker 2.31.3. I both cases, I used the same
>>> pre-computed transcript and protein alignments and the same (latest) version
>>> of SNAP as the only ab initio predictor. Maker 2.10 predicted 44 genes while
>>> Maker 2.31.3 predicted 63. If we group gene models into loci by overlap,
>>> there are 33 loci with gene models from both 2.10 and 2.31.3, 1 locus with
>>> only models from 2.10, and 28 loci with only models from 2.31.3.
>>> 
>>> Before this experiment, I assumed the issue was related to providing
>>> pre-computed alignments in GFF3 format and perhaps violating some important
>>> assumption. However, this experiment makes me wonder whether there have been
>>> changes to how Maker filters ab initio gene models between version 2.10 and
>>> version 2.31.3? Do you have any ideas? If it would help, I could put
>>> together a small data set that reproduces the behavior I just described.
>>> 
>>> Thanks!
>>> 
>>> --
>>> Daniel S. Standage
>>> Ph.D. Candidate
>>> Computational Genome Science Laboratory
>>> Indiana University
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m
>>> aker-devel_yandell-lab.org
>> 
> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140606/90f1ef7d/attachment-0003.html>


More information about the maker-devel mailing list