[maker-devel] Couple quick questions about Maker

Carson Holt carsonhh at gmail.com
Thu Jul 10 15:44:48 MDT 2014


Also you can use repeat_gff in the control files, by I prefer just to rerun
in the same directory as the previous job.

--Carson


From:  Carson Holt <carsonhh at gmail.com>
Date:  Thursday, July 10, 2014 at 3:38 PM
To:  Nathaniel Jue <n.jue at uconn.edu>
Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Couple quick questions about Maker

Just rerun in the same directory and it will reuse the old reports, so it
won't have to rerun RepeatMasker etc.

--Carson


From:  Nathaniel Jue <n.jue at uconn.edu>
Date:  Thursday, July 10, 2014 at 3:36 PM
To:  Carson Holt <carsonhh at gmail.com>
Subject:  Re: [maker-devel] Couple quick questions about Maker

Is there a way to by-pass the repeat prediction after it's done the first
time? Seems like when I went to re-do the snap training, it's decided to
re-do the repeatmasking as well. Does it always do that? Maybe I'm
misinterpreting something? If there is a way to give it a maker generated
gff to by-pass that step could you tell me where to find it?

Thanks,
Nate


On Tue, Jul 8, 2014 at 12:31 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Convert them both to ZFF, then concatenate the ZFF and sequence files.
> 
> --Carson
> 
> 
> From:  Nathaniel Jue <n.jue at uconn.edu>
> Date:  Tuesday, July 8, 2014 at 9:56 AM
> To:  Carson Holt <carsonhh at gmail.com>
> Cc:  Daniel Ence <dence at genetics.utah.edu>, "<maker-devel at yandell-lab.org>"
> <maker-devel at yandell-lab.org>
> 
> Subject:  Re: [maker-devel] Couple quick questions about Maker
> 
> Carson, one more question: Any suggestions on how to combine the cegma and
> maker est2genome/protein2genome results? Can I just concatenate and sort the
> gff files or are there specific formating issues I need to consider? No
> overlapping regions or something like that?
> 
> Thanks,
> Nate
> 
> 
> Nathaniel Jue, Ph.D.
> 
> Department of Molecular and Cell Biology
> 
> University of Connecticut
> 
> Storrs, CT 06269
> 
> 
>  
> <http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnathan
> iel-jue%2F1%2F531%2F176%2F&sn=>
> 
> 
> On Mon, Jul 7, 2014 at 12:26 PM, Carson Holt <carsonhh at gmail.com> wrote:
>> I don't think RepeatMasker produces GFF3.  I believe it is GFF2 with the -gff
>> option (which is pretty different). Also If you provide GFF# files for
>> repeats, you will still need to turn of repeat masking in the control files
>> by blanking out the options.  Also MAKER uses a step called RepeatRunner
>> against an internal transposable element protein databases which is probably
>> still running (and is slow because it's a search in translated protein
>> space).
>> 
>> For performance, you may want to give a larger max_dna_len for the MAKER run
>> given that you have a large RAM machine. Also set all the depth_blast in
>> maker_bopts.ctl to 15 or 20.
>> 
>> CEGMA is convenient for training predictors because it finds genes that will
>> always be in every eukaryote (I.e. high confidence).  You can combine these
>> with est2genome/protein2genome results from MAKER if you want.  You can then
>> use the resulting HMM for a larger MAKER run with experimental evidence, and
>> then train again on those results.  But beware than there is rarely any
>> benefit from training beyond that second round.  More training actually tends
>> to makes things worse (the overtraining paradox).
>> 
>> --Carson
>> 
>> 
>> 
>> From:  Daniel Ence <dence at genetics.utah.edu>
>> Date:  Monday, July 7, 2014 at 10:00 AM
>> To:  Nathaniel Jue <n.jue at uconn.edu>
>> Cc:  "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
>> Subject:  Re: [maker-devel] Couple quick questions about Maker
>> 
>> Hi Nathaniel, 
>> 
>> 1) We'll need to see the error messages that MAKER was giving to understand
>> what might have gone wrong with the Repeat Masker gff3 file. If you could run
>> maker on one of your scaffolds with your current settings and send us the
>> complete output, we can start to figure out what happened.
>> 
>> 2) MAKER interacts with its gene predictors (augustus, snap, and the other
>> ones listed in the control files) in a way that improves their performance
>> (with the hints and such). When you supply predictions through the pred_gff
>> parameter, MAKER can't give that performance improvement, so there's
>> something of a tradeoff there. I think the performance improvement is a key
>> part of MAKER's success, so I would definitely recommend running the
>> ab-initio tools internally.
>> 
>> MAKER tries to save you time by saving results from run to run and only
>> rerunning tools (usually blast tools) that had their parameters changed in
>> the control files. Taking advantage of that will probably be the biggest time
>> saver for you. Something else that could save you almost as much time would
>> be to set a reasonable lower-bound on the size of contigs that maker will try
>> to annotate (usually <5kbp or <10kbp depending on your genome). This
>> parameter is set with the min_contig parameter.
>> 
>> I'll have to check with my lab mates about the Repeat ORF searching and how
>> they use CEGMA results. I think you can probably just put them all in there
>> at once though. 
>> 
>> ~Daniel
>> 
>> 
>> 
>> 
>> Daniel Ence
>> Graduate Student
>> dence at genetics.utah.edu
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>> 
>> On Jul 7, 2014, at 9:26 AM, Nathaniel Jue <n.jue at uconn.edu>
>>  wrote:
>> 
>>> Hi, 
>>> 
>>> I'm trying to run maker on a couple genomes right now and was wondering if
>>> folks had any thoughts on way to speed it up a bit. I'm running it on a
>>> 48-processor supercomputer (lots of RAM, usually use it for genome
>>> assembly). Both these genomes are a little fragmented, so there are lots of
>>> contigs, which slows down the whole process. I am looking for ways to speed
>>> things up and was wondering about a couple things:
>>> 
>>> 1) I'm currently just at the first round of maker predictions using EST and
>>> protein evidence to build models. Had already done RepeatMasking so thought
>>> I'd just input subsequent GFF to speed it up. Didn't seem to like the GFF,
>>> so two questions: i) any thoughts on why that GFF wasn't acceptable? It's
>>> the one that repeatmasker outputs if you ask it to; and ii) Providing this
>>> GFF, should generally allow the program to bypass the RepeatMasking step,
>>> correct? Does it also make it bypass the Repeat ORF searching step?
>>> 
>>> 2) I plan to run both SNAP and Augustus on these genomes as well. The
>>> two-step SNAP training from the tutorials seems straightforward, but I was
>>> wondering about the Augustus step. From what I can tell, simply providing an
>>> Augustus "trained" species name should turn on Augustus and blast/blat-like
>>> hints generated within Maker are then used in gene prediction. Any thoughts
>>> on if it's either more accurate or faster to do the Augustus predictions
>>> outside of the Maker pipeline and then import them using the pred_gff
>>> parameter in the maker_opts file?
>>> 
>>> 3) Finally, I noticed that you had a script for converting cegma gff files
>>> to zff file for snap training? Currently, I am using predicted transcript
>>> for this species and protein sequences from related species to training.
>>> Does anyone have any insight into using CEGMA results as well? Do you work
>>> iteratively with them? For instance, start with the using hints from more
>>> distant taxa (i.e. CEGMA) and then work your way closer? Just throw
>>> everything in at once and retrain after that?
>>> 
>>> Thanks in advance for any advice and insight.
>>> 
>>> Cheers,
>>> Nate
>>> 
>>> 
>>> Nathaniel Jue, Ph.D.
>>> Department of Molecular and Cell Biology
>>> University of Connecticut
>>> Storrs, CT 06269
>>> 
>>>  
>>> <http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnath
>>> aniel-jue%2F1%2F531%2F176%2F&sn=>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma
>> ker-devel_yandell-lab.org
> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140710/cba43ba3/attachment-0003.html>


More information about the maker-devel mailing list