[maker-devel] tradeoff between run time & file number

Carson Holt carsonhh at gmail.com
Thu Mar 20 11:53:24 MDT 2014


You may also want to try the GFF3 pass_through options.  Basically you give
your GFF3 file to maker_gff, tell it what kinds of evidence to maintain from
your past run by setting the 'pass' options to 1.  Then you can run without
your fast file inputs for ESTs, Proteins, and repeats (also blank out repeat
masker species as well).  The values will be passed forward from the GFF3
file into the current run.

--Carson


From:  Daniel Ence <dence at genetics.utah.edu>
Date:  Wednesday, March 19, 2014 at 11:43 PM
To:  Rebecca Harris <rbharris at uw.edu>, "maker-devel at yandell-lab.org"
<maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] tradeoff between run time & file number

Hi Rebecca, So, as far as pruning down the dataset goes, I think that the
biggest gains will be made by trimming the number of scaffolds that you
annotate. What is the n50 of your 400,000 scaffold set? Usually, scaffolds
shorter than 5k or 10kbp won't contribute much to the gene counts in the
end. 

Also, if you can, try to avoid using the alt_est option. It works completely
fine, but blasting those sequences takes much longer than blastn or blastp.

Otherwise, I'd need to see your maker_opts.ctl file to see how you've got
things set up. You can attach those to your reply (to the maker-devel list),
and I'll take a look. I don't how to force maker to create fewer files. You
definitely want to be able to make use of the results from prior runs to
save time. 

Thanks,
Daniel

Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Rebecca
Harris [rbharris at uw.edu]
Sent: Wednesday, March 19, 2014 7:19 PM
To: maker-devel at yandell-lab.org
Subject: [maker-devel] tradeoff between run time & file number

Hi - 

I'm running maker on a dataset of >400,000 scaffolds with MPI -n 64. I've
gone through it once - and used the clean_up option because otherwise maker
exceeds the clusters file_quote. However, now I'm retraining SNAP and it is
taking a very long time - probably because it has to go through BLAST again.
Is there anyway of getting around this? I expect I may have to train SNAP
and rerun maker multiple times and it is taking about 3 weeks to get through
my dataset. Is there a way to prune down my original dataset based on
maker's output?

Thanks,
Rebecca
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140320/583f25f5/attachment-0003.html>


More information about the maker-devel mailing list