[maker-devel] evidence for MAKER vs evidence to train gene finders
Steven Sullivan
sullis02 at nyu.edu
Mon Sep 19 22:21:18 MDT 2016
I'm confused about the use(s) of gene sequence evidence in the MAKER de
novo annotation pipeline
As I understand it, MAKER combines 1) its own BLAST alignments of
user-supplied RNA ('EST evidence') and protein ('protein homology
evidence') sequences to the genome assembly, with 2) models suggested by
trained ab initio gene finders that run in parallel.
The gene finders require a prior training step, and the training
sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no
'gold standard' gene annotation exist for a newly-sequenced genome.
Therefore it describes an iterative/bootstrap process whereby initial
MAKER output becomes the gene finder training input for e.g. SNAP, whose
output is then used in the next MAKER round.
But in my case, even before the genome was sequenced, a few hundred
individual high-quality DNA/protein gene sequences for my species have
already been deposited in public databases (Genbank, Swissprot) by various
labs over the years, to accompany various publications.
Should these be used to train gene finders prior to a MAKER run, and *also*
as user-supplied 'protein homology evidence' to MAKER itself?
Or am I misunderstanding the workflow?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20160920/7672004c/attachment-0002.html>
More information about the maker-devel
mailing list