<html dir="ltr">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" id="owaParaStyle"></style>

</head>

<body fpstyle="1" ocsi="0">

<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Hi Felipe, 

<div><br>

</div>

<div>I think 1500 models sounds like a good size set with which to train SNAP. I think that SNAP expects ~1000 models for training. </div>

<div><br>

</div>

<div>The only other comment on the approach is perhaps that using only one ab-initio predictor is a little bit risky. Using multiple predictors would allow MAKER to select from among their different models for the one that best fits the evidence. </div>

<div><br>

</div>

<div>Good luck and let us know if there's anything we can help with!</div>

<div><br>

Thanks,</div>

<div>Daniel<br>

<div><br>

<div class="BodyFragment"><font size="2">

<div class="PlainText">Daniel Ence<br>

Graduate Student<br>

Eccles Institute of Human Genetics<br>

University of Utah<br>

15 North 2030 East, Room 2100<br>

Salt Lake City, UT 84112-5330</div>

</font></div>

</div>

<div style="font-family: Times New Roman; color: #000000; font-size: 16px">

<hr tabindex="-1">

<div id="divRpF54511" style="direction: ltr;"><font face="Tahoma" size="2" color="#000000"><b>From:</b> maker-devel [maker-devel-bounces@yandell-lab.org] on behalf of Felipe Barreto [fbarreto@ucsd.edu]<br>

<b>Sent:</b> Tuesday, March 18, 2014 10:08 AM<br>

<b>To:</b> MAKER group<br>

<b>Subject:</b> [maker-devel] Size of initial EST training set for SNAP<br>

</font><br>

</div>

<div></div>

<div>

<div dir="ltr"><span style="font-family:arial,sans-serif; font-size:12.727272033691406px">Hi, all,</span>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px"><br>

</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px">I've been learning a lot from reading posts from this group, and finally started doing actual runs of Maker on our current genome assembly (arthropod, genome size ~230Mb).  I started

 by training SNAP, but would like to check my approach before continuing with longer runs.  </div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px"><br>

</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px">From our full set of ~40,000 ESTs (RNA-seq assembly), I chose ~2000 that I deemed of very high quality based on blast alignments to Swiss-Prot (based on query-subject coverage, bit score,

 etc).  I then used only these 2000 ESTs in a first Maker run using est2genome=1.  The output returned 1500 models (with the 500 "missing" models probably a result of single-exon issues; not a concern at this point).</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px"><br>

</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px">I now plan on training SNAP with this first output, and then doing another Maker run now using: 1) all ESTs (but est2genome=0), 2) my chosen protein evidence, and 3) SNAP with the first

 HMM file.  The output of this second run will be used to re-train SNAP, and this second HMM file will be used in a final "official" run (while continuing to provide the EST and protein evidence, of course).</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px"><br>

</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px">Does this sound like a reasonable approach?  Simply put, my main concern is whether I'm using too few ESTs in my first est2genome step.</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px"><br>

</div>

<div style="font-family:arial,sans-serif; font-size:12.727272033691406px">Thanks for any insight!</div>

<div><br>

</div>

-- <br>

Felipe Barreto<br>

Post-doctoral Scholar<br>

Scripps Institution of Oceanography<br>

University of California, San Diego<br>

La Jolla, CA 92093 </div>

</div>

</div>

</div>

</div>

</body>

</html>