[maker-devel] Couple quick questions about Maker
Nathaniel Jue
n.jue at uconn.edu
Mon Jul 7 09:26:05 MDT 2014
Hi,
I'm trying to run maker on a couple genomes right now and was wondering if
folks had any thoughts on way to speed it up a bit. I'm running it on a
48-processor supercomputer (lots of RAM, usually use it for genome
assembly). Both these genomes are a little fragmented, so there are lots of
contigs, which slows down the whole process. I am looking for ways to speed
things up and was wondering about a couple things:
1) I'm currently just at the first round of maker predictions using EST and
protein evidence to build models. Had already done RepeatMasking so thought
I'd just input subsequent GFF to speed it up. Didn't seem to like the GFF,
so two questions: i) any thoughts on why that GFF wasn't acceptable? It's
the one that repeatmasker outputs if you ask it to; and ii) Providing this
GFF, should generally allow the program to bypass the RepeatMasking step,
correct? Does it also make it bypass the Repeat ORF searching step?
2) I plan to run both SNAP and Augustus on these genomes as well. The
two-step SNAP training from the tutorials seems straightforward, but I was
wondering about the Augustus step. From what I can tell, simply providing
an Augustus "trained" species name should turn on Augustus and
blast/blat-like hints generated within Maker are then used in gene
prediction. Any thoughts on if it's either more accurate or faster to do
the Augustus predictions outside of the Maker pipeline and then import them
using the pred_gff parameter in the maker_opts file?
3) Finally, I noticed that you had a script for converting cegma gff files
to zff file for snap training? Currently, I am using predicted transcript
for this species and protein sequences from related species to training.
Does anyone have any insight into using CEGMA results as well? Do you work
iteratively with them? For instance, start with the using hints from more
distant taxa (i.e. CEGMA) and then work your way closer? Just throw
everything in at once and retrain after that?
Thanks in advance for any advice and insight.
Cheers,
Nate
*Nathaniel Jue, Ph.D.*
Department of Molecular and Cell Biology
University of Connecticut
Storrs, CT 06269
[image: LinkedIn]
<http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnathaniel-jue%2F1%2F531%2F176%2F&sn=>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140707/748bab1a/attachment-0002.html>
More information about the maker-devel
mailing list