[maker-devel] FW: maker-control file

Thu Mar 6 00:26:29 MST 2014

Hi,

I think this is an interesting comment that I would like a few more information on:

correct_est_fusion should not be used together with est2genome.  It won’t
fail, you just get odd results.  Actually est2genome should not ever be
used to generate the final annotation set.  It is a convenience method
that allows you to generate rough models for training gene predictors like
SNAP and Augustus.  But once they are trained it should be turned off,
because the models it produces will be partial (Ests rarely cover the
whole transcript) and the results will have many false potties from
background transcription events from your EST data.  These models are good
enough to train with, but make very poor final annotations. So in the end
you should be using correct_est_fusion=1 with the SNAP pr Augustus set and
not est2genome (which should already have been turned off by then).

My experience has been that the process of training gene finders, especially for complex genomes like vertebrates, is a very slow and painful process. And ultimately, the results are far from accurate, even with a sizeable, manually curated training set. Wouldn’t it be more sensible to rely on the evidence over probabilistic models? The annotation would be partial, but on the other hand the chance of incorporating false signals are smaller (assuming I can generate a clean set of transcripts from RNA-seq data)? And I’d rather underestimate the exon inventory slightly than putting out an annotation with ~ 10% false exon calls.

As an example, using SNAP and Augustus on a bird genome - with augustus achieving nucleotide and exon sensitivities in the 70-90% range gave a host if false exons that were simply not supported by the RNAseq data, yet made it into the final gene build. Not sure what to think about that to be honest. Is it possible to get some more details on how Maker uses ab-inito predictions and reconciles them with evidence alignments? At the moment it seems to me that maker gives higher weight to the ab-initio predictions, which to me seems problematic.

/Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140306/f7acdc87/attachment-0002.html>