<div dir="ltr">Hi Everyone,<div><br></div><div>I’ve been experimenting with optimizing Amazon to perform the HMM training of augustus more speedily, based on a procedure that Kevin Childs has written for “speedy” Augustus training.  The procedure essentially comes from taking a subset of the genes predicted by SNAP, rather than the whole genome and constructing the training set— a good idea that undoubtedly saves a lot of time.  I’ve written some modifications to the Augustus scripts and dependencies to try to speed this process up on Amazon, and I’d be happy to share my notes with anyone that is interested.  I’ve gotten it to the point where the whole AutoAug procedure can be accomplished in a day on a small cluster.  </div><div><br></div><div>I think that working with the Augustus authors, more improvements could be made, but the whole experience with Augustus has  lead me to some questions more generally...</div><div><br></div><div>1) One of the things noted in monkeying around with this reduced gene set procedure is that you are unable to do UTR training with Augustus— the AutoAug script complains that there aren’t enough genes left to make an adequate training set.  Has anyone noted this, because I haven’t seen much discussion of how important that the Augustus HMM is trained for UTRs when used in the Maker2 pipeline.<br></div><div><br></div><div>2) I’ve been trying to evaluate how good my AUGUSTUS HMM is based on the training set.  Running the newly trained species file, I see that the performance on the “exon level” is low (around 5-6%) but sensitivity on the nucleotide level is in the 89-95%, where the specificity is in the 50-60% range, which seems consistent with other users on this and the Augustus list serve. This is assessed based on a training set of approximately 200 genes selected from the output generated by multiple iterative runs using the SNAP program, documented in the MAKER tutorial.  This is all based on data & genes selected from a  “to be published” genome of an electric fish I’m working on.  </div><div><br></div><div>3) Just for laughs, I tried the HMM trained for zebrafish on the same training set and found that the performance was slightly better than my species-specific one that I’ve been working so hard on (a few percentage points on both nucleotide level sensitivity and specificity).</div><div><br></div><div>I’ve reasoned that it might be best in terms of reproducibility to run Maker one last time with my multiple rounds of SNAP hmm together with the augustus zebrafish species file, rather than using my own custom species training.  Can anyone think of a good reason why not to do this?  Are there qualities/benefits not expressed by these sensitivity/specificity measures not captured that I would benefit using my own custom species trained file for?</div><div><br></div><div>What are folks’ experiences with AUGUSTUS in this regard?  Many thanks for any advise in advance!</div><div><br></div><div>Jason Gallant</div></div><div dir="ltr">-- <br></div><div dir="ltr"><div><span>----</span></div><span>Dr. Jason R. Gallant</span><div>Assistant Professor</div><div>Room 38 Natural Sciences<br><div>Department of Integrative Biology</div><div>Michigan State University</div><div>East Lansing, MI 48824</div><div><a href="mailto:jgallant@msu.edu">jgallant@msu.edu</a></div></div><div>office: 517-884-7756</div></div>