[maker-devel] evidence for MAKER vs evidence to train gene finders

Steven Sullivan sullis02 at nyu.edu
Mon Sep 19 22:21:18 MDT 2016


I'm confused about the use(s) of gene sequence evidence in the MAKER de
novo annotation pipeline

As I understand it, MAKER combines 1) its own BLAST alignments of
user-supplied RNA ('EST evidence') and protein ('protein homology
evidence') sequences to the genome assembly, with 2) models suggested by
trained ab initio gene finders that run in parallel.

The gene finders require a prior training step,  and the training
sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no
'gold standard' gene annotation exist for a newly-sequenced genome.
Therefore it describes an iterative/bootstrap  process whereby initial
MAKER output becomes the gene finder training input for e.g. SNAP, whose
output is then used in the next  MAKER round.

But in my case, even before the genome was sequenced, a few hundred
individual high-quality DNA/protein gene sequences for my species  have
already been deposited  in public databases (Genbank, Swissprot) by various
labs over the years, to accompany various publications.

Should these be used to train gene finders prior to a MAKER run, and *also*
as user-supplied 'protein homology evidence' to MAKER itself?

Or am I misunderstanding the workflow?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160920/7672004c/attachment-0002.html>


More information about the maker-devel mailing list