[maker-devel] altest without MPI?
Carson Holt
carsonhh at gmail.com
Sun Jul 21 15:09:12 MDT 2013
Glad I could help. If you come across anything else, just let us know.
Thanks,
Carson
On 7/21/13 9:44 AM, "Jacqueline R M Doyle" <jmdoyle at purdue.edu> wrote:
>I just wanted to check in and say that this strategy worked really
>nicely. I ended up using MPI and 48 cpus. I trained SNAP with
>protein2genome and my contigs greater than 1200000 bases. Each of the
>training runs took less than 4 hours. My final run with all my contigs
>(and altest) took about 5 and 1/2 days. Not bad!
>
>----- Original Message -----
>From: Carson Holt <carsonhh at gmail.com>
>To: Jacqueline R M Doyle <jmdoyle at purdue.edu>, maker-devel at yandell-lab.org
>Sent: Wed, 19 Jun 2013 21:05:49 -0400 (EDT)
>Subject: Re: [maker-devel] altest without MPI?
>
>The throughput is based on contig length, so long contigs will take longer
>than short contigs. Any contig less than 10kb is mostly useless for
>annotation purposes (so you can filter those from your 800,000 right
>away). Take your contigs that finish, and sum up their length to get a
>better estimate of how long it will take to complete running. Most
>genomes can complete in a few days an a multi-core machine. Bigger
>genomes or bigger datasets take longer. (note that altest evidence takes
>3-4x longer to align than proteins).
>
>The advantage of proteins is that the species do not have to be closely
>related. Nucleotide sequence diverges quickly and proteins slowly (that's
>why proteins are used for phylogenetic trees).
>
>A good strategy would be to get ~10Mb of sequence (use your longest
>contigs). Run with Chicken, turkey, and pigeon proteins. Use the
>protein2genome option to generate annotations. Those annotations should
>now be sufficient to train SNAP and Augustus. Then you can finish by
>running all your contigs with the same dataset (protein2genome now turned
>off), use the newly trained snap and augustus files along with any altest
>files you want to use. Note that the size of the dataset will determine
>the total run time.
>
>To get things to run faster, you can also run on your university's
>computer cluster (then you will have hundreds of cpus available to you).
>The purdue cluster supports MPI and with 30-50 cpus you could annotate
>even large genomes in a reasonable time. Alternatively you can request a
>startup account at XSEDE, an NFS funded computer resource open to all US
>institutions. A startup allocation with 50,000 cpu hours only takes 2
>weeks to approve. You should request an allocation on the Lonestar cluster
>if you go that route, it has 64,000 cpus. I was able to annotate the Maize
>genome (which is a very large genome at over 2 gigabases). I used an
>abnormally large EST and protein datasets (~4 gigabases of evidence which
>is much more than a normal annotation job), and it completed in under 3
>hours on 2,100 cpus.
>
>--Carson
>
>
>On 13-06-19 5:12 PM, "Jacqueline R M Doyle" <jmdoyle at purdue.edu> wrote:
>
>>Hi Carson (and whoever else might be reading this!)
>>
>>Thanks so much, I think splitting the files up using fasta_tool will
>>definitely move things along. I did a trial version with altest this
>>weekend, and seemed to be averaging about an hour a scaffold (with 1
>>cpu). I'm a little concerned, as we have ~800,000 scaffolds. Does this
>>seem like a reasonable estimate of the time it should take to annotate
>>one sequence? Could I be missing something in my maker_opts file?
>>
>>Let me back up for just a minute and describe the project a little more
>>generally. As I mentioned before, we have no protein sequences or ESTs
>>for our species of interest, which is an avian species. I could
>>potentially use proteins from chicken or turkey, but neither is closely
>>related to our species. Time is a bit of an issue... do you have any
>>thoughts on how much time per scaffold it should take to annotate using
>>protein2genome? If chicken and turkey are not closely related, is it
>>worth the time investment?
>>
>>Let me finish by saying I think MAKER is wonderful, and I really
>>appreciate the discussions on this group.
>>
>>Best wishes, Jackie
>
>
>
More information about the maker-devel
mailing list