[maker-devel] Size of initial EST training set for SNAP

Daniel Ence dence at genetics.utah.edu
Tue Mar 18 10:16:20 MDT 2014


Hi Felipe,

I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training.

The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence.

Good luck and let us know if there's anything we can help with!

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330
________________________________
From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Felipe Barreto [fbarreto at ucsd.edu]
Sent: Tuesday, March 18, 2014 10:08 AM
To: MAKER group
Subject: [maker-devel] Size of initial EST training set for SNAP

Hi, all,

I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started by training SNAP, but would like to check my approach before continuing with longer runs.

>From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score, etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).

I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).

Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.

Thanks for any insight!

--
Felipe Barreto
Post-doctoral Scholar
Scripps Institution of Oceanography
University of California, San Diego
La Jolla, CA 92093
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140318/b9bf5ff0/attachment-0003.html>


More information about the maker-devel mailing list