[maker-devel] Maker in the cloud
Barry Moore
barry.moore at genetics.utah.edu
Thu Sep 5 12:06:05 MDT 2013
Hi Jasmin,
Like Carson, my only significant experience with MAKER in the cloud is using it for our training, however, I'll add make some comments based on experience on the cloud with some of our other tools:
There are several cloud architectures available now, but I only have experience with Amazon EC2, so all comments are only relevant there.
I wouldn't use any of the existing MAKER AMIs. All of them were created for tutorial purposes, and while they should work fine for a real annotation job, they will be out of date. At the very least if you use one, start with it, but install current MAKER code and save it as a new AMI. You can use MPI on the Amazon nodes, but it's not set up by default to run MPI between nodes. That, can presumably be done but we haven't done it, so there may be headaches involved we just don't know for sure. However, you could split your input fasta into several chunks of roughly equal size and fire up a different EC2 node for each fasta file, then allow maker to use MPI to optimize parallelization on each node individually. MAKER is really good at restarting if things fail, so with that in mind I'd suggest staring spot nodes which can be 10X cheaper than regularly priced nodes. Amazon will kill a spot node as soon as someone comes along who is willing to pay full price, so you'd want a way (either manually checking and restarting nodes or scripting a AWS API solution) to check whether nodes finished and restart them if they did not, but you could save a lot of money by doing this.
B
On Sep 5, 2013, at 9:58 AM, Jasmin Zohren wrote:
> Dear Maker developers,
>
> I’ve already contacted you a while ago about my annotation of the birch genome (Betula nana). As I am constantly running into problems using our cluster facilities at QMUL I thought of moving into the cloud. As I am rather inexperienced in cloud computing I have several questions:
>
> 1. To me it seems that there are two different Maker images on EC2 – ami-ea661f83 and ami-b10abed8 – which one is “the right one”?
> 2. Can I use this Maker AMI for the annotation of a whole genome or is it only suitable for the tutorial tasks?
> 3. Also, when I followed the steps outlined in the tutorial, there seemed to be a problem with RepeatMasker. Although Maker would run and produce output files, the log file stated that the contig had failed after the second attempt. I launched the image on a T1.micro instance, maybe that wasn’t enough computing power? Or do you have another explanation for this?
> 4. Would it be possible to run the annotation in parallel (e.g. using MPICH2) in the cloud? I’ve also recently heard about a parallelisation module for use in the cloud developed by Era7, called “nispero”. But I am not sure whether it is publicly available yet.
> 5. Do you have any experience of how long an annotation task in the cloud would take and also what the expected costs would be? The birch genome is only 500 MB in size and currently I am simply annotating it with a SNAP trained HMM. However, in the future I will feed it with RNAseq data as well.
>
> Many thanks in advance and kind regards,
> Jasmin
>
> -----------------------------
> Jasmin Zohren
> PhD student in the INTERCROSSING ITN
> Queen Mary University of London
>
> intercrossing.wikispaces.com
> evolve.sbcs.qmul.ac.uk
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20130905/bf35206e/attachment-0003.html>
More information about the maker-devel
mailing list