[maker-devel] Filtering of ab initio gene models

Volker Brendel vbrendel at indiana.edu
Fri Jun 6 15:52:08 MDT 2014


Hi Carson,
is there a way of allowing MAKER to add UTRs to our external models 
(supplied by the pred_gff or model_gff tag)?  This seems to be one 
problem we are running into.  Our external models are high quality, but 
CDS only.  Thus their score gets knocked down relative to ab initio 
predictions with added UTRs.

Daniel will have more questions/observations later with regard to 
overlapping gene models (we definitely need to allow gene models to 
overlap in the UTRs, because transcript evidence clearly shows such 
negative intergenic spaces).

Thanks for all your help!
Volker

On 6/6/2014 11:39 AM, Carson Holt wrote:
> snap_masked-$seqid-processed-gene was produced by SNAP on the repeat 
> masked sequence without hints (i.e. the ab initio call).
> maker-$seqid-snap-gene was produced by SNAP after receiving hints from 
> MAKER.
>
> In both cases MAKER is allowed to add UTR to the model (hence the 
> 'processed' tag).
>
> --Carson
>
>
> From: Daniel Standage <daniel.standage at gmail.com 
> <mailto:daniel.standage at gmail.com>>
> Date: Friday, June 6, 2014 at 10:33 AM
> To: Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
> Cc: Maker Mailing List <maker-devel at yandell-lab.org 
> <mailto:maker-devel at yandell-lab.org>>, Volker Brendel 
> <vbrendel at indiana.edu <mailto:vbrendel at indiana.edu>>
> Subject: Re: [maker-devel] Filtering of ab initio gene models
>
> Another question: is there documentation anywhere for the naming 
> conventions of the genes annotated by Maker? Of course it's easy to 
> spot genes based on a particular /ab initio/ gene predictor, as the 
> names are in the IDs. But what is the significance of, say, 
> "snap_masked-$seqid-processed-gene" in a gene ID vs 
> "maker-$seqid-snap-gene"?
>
> Thanks,
> Daniel
>
>
> --
> Daniel S. Standage
> Ph.D. Candidate
> Computational Genome Science Laboratory
> Indiana University
>
>
> On Thu, Jun 5, 2014 at 2:05 PM, Daniel Standage 
> <daniel.standage at gmail.com <mailto:daniel.standage at gmail.com>> wrote:
>
>     I have attached data for a small 18kb region with a handful of
>     genes, as well as the corresponding maker_opts.ctl file. (This is
>     a smaller and different data set than what I was looking at
>     yesterday, with a more well-defined problem).
>
>     With the data files as is, Maker 2.31.3 reports a model from 4125
>     to 6400 with an AED of 0.23. If you exclude transcript TSA024184,
>     Maker reports a different gene from 6111 to 8345 with an AED of
>     0.01. Both of these genes have transcript support: will Maker
>     report overlapping genes under any conditions? And even if Maker
>     is forced to choose only a single gene to report, why would the
>     model from 4125 to 6400 ever be reported in place of the one from
>     6111 to 8345, especially since this is provided in the model_gff file?
>
>     Even when transcript TSA024184 is included, Maker 2.10 reports the
>     high-confidence gene from 611 to 8345.
>
>     Any light you could shed would be helpful. Thanks!
>
>
>     --
>     Daniel S. Standage
>     Ph.D. Candidate
>     Computational Genome Science Laboratory
>     Indiana University
>
>
>     On Wed, Jun 4, 2014 at 3:17 PM, Carson Holt <carsonhh at gmail.com
>     <mailto:carsonhh at gmail.com>> wrote:
>
>         Just eAED, but eAED can affects selection of ab initio
>         results.  For example reading frame match of protein evidence
>         which also affects whether evidence from single_exon=1 and
>         genes with single_exon protein evidence get kept.  There is
>         also the assumption that your alignments in GFF3 are are
>         correctly spliced (like BLAT does).  So giving blastn results
>         as precomputed est_gff would create a lot of noise, since
>         maker ignores blastn and is using it only to seed the polished
>         exonerate alignments.
>
>         --Carson
>
>
>         From: Daniel Standage <daniel.standage at gmail.com
>         <mailto:daniel.standage at gmail.com>>
>         Date: Wednesday, June 4, 2014 at 1:11 PM
>         To: Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
>         Cc: Maker Mailing List <maker-devel at yandell-lab.org
>         <mailto:maker-devel at yandell-lab.org>>
>         Subject: Re: [maker-devel] Filtering of ab initio gene models
>
>         I do not provide Gap or Target attributes in the GFF3. Will
>         this affect the AED as well, or just the eAED?
>
>
>         --
>         Daniel S. Standage
>         Ph.D. Candidate
>         Computational Genome Science Laboratory
>         Indiana University
>
>
>         On Wed, Jun 4, 2014 at 3:09 PM, Carson Holt
>         <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>
>             Sure.  that would be helpful.  One question. Do you
>             provide the Gap attribute in your precomputed alignments?
>              Having or not having that attribute affects the eAED
>             score which takes reading frame into account, and may
>             cause some things to be kept that normally would be
>             dropped, because MAKER won't be able to take the points of
>             mismatch of the alignment into account (it just assumes
>             match everywhere).
>
>             --Carson
>
>
>             From: Daniel Standage <daniel.standage at gmail.com
>             <mailto:daniel.standage at gmail.com>>
>             Date: Wednesday, June 4, 2014 at 1:03 PM
>             To: Maker Mailing List <maker-devel at yandell-lab.org
>             <mailto:maker-devel at yandell-lab.org>>
>             Subject: [maker-devel] Filtering of ab initio gene models
>
>             Thanks everyone for your responses recently!
>
>             The reason for my recent flurry of email activity is that
>             I'm seeing some unexpected trends when running the new
>             version of Maker with precomputed alignments. Compared
>             with an annotation I did a while ago (Maker 2.10,
>             Maker-computed alignments), this new annotation has a
>             substantial number of new genes annotated. If I compare
>             distributions of AED scores between the old and new
>             annotation, it's clear that the new annotation has a lot
>             more low-quality models. If I look at new gene models that
>             do not overlap with any gene model from the old
>             annotation, the likelihood that it's a low-quality model
>             is much higher.
>
>             I decided to run a little experiment. I annotated a
>             scaffold first using Maker 2.10 and then using Maker
>             2.31.3. I both cases, I used the same pre-computed
>             transcript and protein alignments and the same (latest)
>             version of SNAP as the only /ab initio/ predictor. Maker
>             2.10 predicted 44 genes while Maker 2.31.3 predicted 63.
>             If we group gene models into loci by overlap, there are 33
>             loci with gene models from both 2.10 and 2.31.3, 1 locus
>             with only models from 2.10, and 28 loci with only models
>             from 2.31.3.
>
>             Before this experiment, I assumed the issue was related to
>             providing pre-computed alignments in GFF3 format and
>             perhaps violating some important assumption. However, this
>             experiment makes me wonder whether there have been changes
>             to how Maker filters /ab initio/ gene models between
>             version 2.10 and version 2.31.3? Do you have any ideas? If
>             it would help, I could put together a small data set that
>             reproduces the behavior I just described.
>
>             Thanks!
>
>             --
>             Daniel S. Standage
>             Ph.D. Candidate
>             Computational Genome Science Laboratory
>             Indiana University
>             _______________________________________________
>             maker-devel mailing list maker-devel at box290.bluehost.com
>             <mailto:maker-devel at box290.bluehost.com>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>
>

-- 
Volker Brendel
Professor of Biology and Computer Science
Indiana University
Department of Biology & School of Informatics and Computing
Simon Hall 205C
212 South Hawthorne Drive
Bloomington, IN 47405-7003

Tel.: (812) 855-7074
http://brendelgroup.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140606/9a2534cf/attachment-0002.html>


More information about the maker-devel mailing list