[maker-devel] Profiling MAKER

Fields, Christopher J cjfields at illinois.edu
Thu Sep 17 20:05:11 MDT 2015


Carson,

Thanks!  Will pass this on to the folks at NCSA, that should help quite a bit.

Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I’m guessing this is BioPerl still?).

chris

On Sep 16, 2015, at 12:22 PM, Carson Holt <carsonhh at gmail.com<mailto:carsonhh at gmail.com>> wrote:

Sorry for the slow reply.  I’m out of the lab right now and will be for the next two weeks.

MAKER uses MPI for parallelization.  So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system.

With MPI, you specify the number of processes to start using the -n flag for mpiexec.  Each MAKER process will need about 2Gb.  It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU).

MAKER will use a lot of IO.  Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I’ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory.

Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don’t always work well with in memory scratch.  You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp.

—Carson


On Sep 15, 2015, at 10:39 AM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:

We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools.  They would like to know CPU, RAM, I/O patterns and usage.  In particular, we’re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm).

The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on?  Any help would be greatly appereciated.

Thanks,

chris

Chris Fields
Technical Lead in Genome Informatics
High Performance Computing in Biology
University of Illinois at Urbana-Champaign
Roy J. Carver Biotechnology Center / W.M. Keck Center
Carl R. Woese Institute for Genomic Biology

_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20150918/ea8d9843/attachment-0003.html>


More information about the maker-devel mailing list