[maker-devel] Size of initial EST training set for SNAP

Tue Mar 18 10:08:47 MDT 2014

Hi, all,

I've been learning a lot from reading posts from this group, and finally
started doing actual runs of Maker on our current genome assembly
(arthropod, genome size ~230Mb).  I started by training SNAP, but would
like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I
deemed of very high quality based on blast alignments to Swiss-Prot (based
on query-subject coverage, bit score, etc).  I then used only these 2000
ESTs in a first Maker run using est2genome=1.  The output returned 1500
models (with the 500 "missing" models probably a result of single-exon
issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another
Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein
evidence, and 3) SNAP with the first HMM file.  The output of this second
run will be used to re-train SNAP, and this second HMM file will be used in
a final "official" run (while continuing to provide the EST and protein
evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is
whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

-- 
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/c8c3b2ba/attachment-0002.html>