[maker-devel] MAKER on AWS

DECKER, KEITH F [AG/1005] keith.decker at bayer.com
Mon Feb 4 11:39:48 MST 2019


Thanks,
Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?

-Keith

From: Carson Holt <carsonhh at gmail.com>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <keith.decker at bayer.com>
Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject: Re: [maker-devel] MAKER on AWS

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they’ve been using XSEDE cloud resources through the NSF)  —>
http://ccl.cse.nd.edu/research/papers/maker-service-ic2e2018.pdf

Here is an example of an external project using their setup —> http://onsnetwork.org/kubu4/2018/08/07/genome-annotation-olympia-oyster-genome-using-wq-maker-instance-on-jetstream/

—Carson





On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <keith.decker at bayer.com<mailto:keith.decker at bayer.com>> wrote:

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use.

So my questions are

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

Thanks and sorry for the long question
Keith





This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.

Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.

Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.

If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.



_______________________________________________
maker-devel mailing list
maker-devel at box290.bluehost.com<mailto:maker-devel at box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190204/05ee72b5/attachment-0003.html>


More information about the maker-devel mailing list