[maker-devel] Couple quick questions about Maker
Carson Holt
carsonhh at gmail.com
Thu Jul 10 15:44:48 MDT 2014
Also you can use repeat_gff in the control files, by I prefer just to rerun
in the same directory as the previous job.
--Carson
From: Carson Holt <carsonhh at gmail.com>
Date: Thursday, July 10, 2014 at 3:38 PM
To: Nathaniel Jue <n.jue at uconn.edu>
Cc: "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] Couple quick questions about Maker
Just rerun in the same directory and it will reuse the old reports, so it
won't have to rerun RepeatMasker etc.
--Carson
From: Nathaniel Jue <n.jue at uconn.edu>
Date: Thursday, July 10, 2014 at 3:36 PM
To: Carson Holt <carsonhh at gmail.com>
Subject: Re: [maker-devel] Couple quick questions about Maker
Is there a way to by-pass the repeat prediction after it's done the first
time? Seems like when I went to re-do the snap training, it's decided to
re-do the repeatmasking as well. Does it always do that? Maybe I'm
misinterpreting something? If there is a way to give it a maker generated
gff to by-pass that step could you tell me where to find it?
Thanks,
Nate
On Tue, Jul 8, 2014 at 12:31 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Convert them both to ZFF, then concatenate the ZFF and sequence files.
>
> --Carson
>
>
> From: Nathaniel Jue <n.jue at uconn.edu>
> Date: Tuesday, July 8, 2014 at 9:56 AM
> To: Carson Holt <carsonhh at gmail.com>
> Cc: Daniel Ence <dence at genetics.utah.edu>, "<maker-devel at yandell-lab.org>"
> <maker-devel at yandell-lab.org>
>
> Subject: Re: [maker-devel] Couple quick questions about Maker
>
> Carson, one more question: Any suggestions on how to combine the cegma and
> maker est2genome/protein2genome results? Can I just concatenate and sort the
> gff files or are there specific formating issues I need to consider? No
> overlapping regions or something like that?
>
> Thanks,
> Nate
>
>
> Nathaniel Jue, Ph.D.
>
> Department of Molecular and Cell Biology
>
> University of Connecticut
>
> Storrs, CT 06269
>
>
>
> <http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnathan
> iel-jue%2F1%2F531%2F176%2F&sn=>
>
>
> On Mon, Jul 7, 2014 at 12:26 PM, Carson Holt <carsonhh at gmail.com> wrote:
>> I don't think RepeatMasker produces GFF3. I believe it is GFF2 with the -gff
>> option (which is pretty different). Also If you provide GFF# files for
>> repeats, you will still need to turn of repeat masking in the control files
>> by blanking out the options. Also MAKER uses a step called RepeatRunner
>> against an internal transposable element protein databases which is probably
>> still running (and is slow because it's a search in translated protein
>> space).
>>
>> For performance, you may want to give a larger max_dna_len for the MAKER run
>> given that you have a large RAM machine. Also set all the depth_blast in
>> maker_bopts.ctl to 15 or 20.
>>
>> CEGMA is convenient for training predictors because it finds genes that will
>> always be in every eukaryote (I.e. high confidence). You can combine these
>> with est2genome/protein2genome results from MAKER if you want. You can then
>> use the resulting HMM for a larger MAKER run with experimental evidence, and
>> then train again on those results. But beware than there is rarely any
>> benefit from training beyond that second round. More training actually tends
>> to makes things worse (the overtraining paradox).
>>
>> --Carson
>>
>>
>>
>> From: Daniel Ence <dence at genetics.utah.edu>
>> Date: Monday, July 7, 2014 at 10:00 AM
>> To: Nathaniel Jue <n.jue at uconn.edu>
>> Cc: "<maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Couple quick questions about Maker
>>
>> Hi Nathaniel,
>>
>> 1) We'll need to see the error messages that MAKER was giving to understand
>> what might have gone wrong with the Repeat Masker gff3 file. If you could run
>> maker on one of your scaffolds with your current settings and send us the
>> complete output, we can start to figure out what happened.
>>
>> 2) MAKER interacts with its gene predictors (augustus, snap, and the other
>> ones listed in the control files) in a way that improves their performance
>> (with the hints and such). When you supply predictions through the pred_gff
>> parameter, MAKER can't give that performance improvement, so there's
>> something of a tradeoff there. I think the performance improvement is a key
>> part of MAKER's success, so I would definitely recommend running the
>> ab-initio tools internally.
>>
>> MAKER tries to save you time by saving results from run to run and only
>> rerunning tools (usually blast tools) that had their parameters changed in
>> the control files. Taking advantage of that will probably be the biggest time
>> saver for you. Something else that could save you almost as much time would
>> be to set a reasonable lower-bound on the size of contigs that maker will try
>> to annotate (usually <5kbp or <10kbp depending on your genome). This
>> parameter is set with the min_contig parameter.
>>
>> I'll have to check with my lab mates about the Repeat ORF searching and how
>> they use CEGMA results. I think you can probably just put them all in there
>> at once though.
>>
>> ~Daniel
>>
>>
>>
>>
>> Daniel Ence
>> Graduate Student
>> dence at genetics.utah.edu
>> Eccles Institute of Human Genetics
>> University of Utah
>> 15 North 2030 East, Room 2100
>> Salt Lake City, UT 84112-5330
>>
>> On Jul 7, 2014, at 9:26 AM, Nathaniel Jue <n.jue at uconn.edu>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to run maker on a couple genomes right now and was wondering if
>>> folks had any thoughts on way to speed it up a bit. I'm running it on a
>>> 48-processor supercomputer (lots of RAM, usually use it for genome
>>> assembly). Both these genomes are a little fragmented, so there are lots of
>>> contigs, which slows down the whole process. I am looking for ways to speed
>>> things up and was wondering about a couple things:
>>>
>>> 1) I'm currently just at the first round of maker predictions using EST and
>>> protein evidence to build models. Had already done RepeatMasking so thought
>>> I'd just input subsequent GFF to speed it up. Didn't seem to like the GFF,
>>> so two questions: i) any thoughts on why that GFF wasn't acceptable? It's
>>> the one that repeatmasker outputs if you ask it to; and ii) Providing this
>>> GFF, should generally allow the program to bypass the RepeatMasking step,
>>> correct? Does it also make it bypass the Repeat ORF searching step?
>>>
>>> 2) I plan to run both SNAP and Augustus on these genomes as well. The
>>> two-step SNAP training from the tutorials seems straightforward, but I was
>>> wondering about the Augustus step. From what I can tell, simply providing an
>>> Augustus "trained" species name should turn on Augustus and blast/blat-like
>>> hints generated within Maker are then used in gene prediction. Any thoughts
>>> on if it's either more accurate or faster to do the Augustus predictions
>>> outside of the Maker pipeline and then import them using the pred_gff
>>> parameter in the maker_opts file?
>>>
>>> 3) Finally, I noticed that you had a script for converting cegma gff files
>>> to zff file for snap training? Currently, I am using predicted transcript
>>> for this species and protein sequences from related species to training.
>>> Does anyone have any insight into using CEGMA results as well? Do you work
>>> iteratively with them? For instance, start with the using hints from more
>>> distant taxa (i.e. CEGMA) and then work your way closer? Just throw
>>> everything in at once and retrain after that?
>>>
>>> Thanks in advance for any advice and insight.
>>>
>>> Cheers,
>>> Nate
>>>
>>>
>>> Nathaniel Jue, Ph.D.
>>> Department of Molecular and Cell Biology
>>> University of Connecticut
>>> Storrs, CT 06269
>>>
>>>
>>> <http://s.wisestamp.com/links?url=http%3A%2F%2Fwww.linkedin.com%2Fpub%2Fnath
>>> aniel-jue%2F1%2F531%2F176%2F&sn=>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>> _______________________________________________ maker-devel mailing list
>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma
>> ker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140710/cba43ba3/attachment-0003.html>
More information about the maker-devel
mailing list