[maker-devel] Size of initial EST training set for SNAP
Felipe Barreto
fbarreto at ucsd.edu
Tue Mar 18 10:08:47 MDT 2014
Hi, all,
I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb). I started by training SNAP, but would
like to check my approach before continuing with longer runs.
>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc). I then used only these 2000
ESTs in a first Maker run using est2genome=1. The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).
I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file. The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).
Does this sound like a reasonable approach? Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.
Thanks for any insight!
--
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/c8c3b2ba/attachment-0002.html>
More information about the maker-devel
mailing list