[maker-devel] Running MAKER on highly fragmented assembly
Mikael Brandström Durling
mikael.durling at slu.se
Mon Feb 25 03:57:21 MST 2013
Hi Carson and Daniel,
It seems that this issue might be related to the other issue about maker not finding sequences in the protein and est fasta files due to too long sequence ids. For those annotation setups where I have truncated the ids to short unique names, it seems maker runs fine on all cpu:s allocated. While on the other sets, it seems maker ranks stall once they can't polish an est or protein alignment. (I have no firm evidence of this, as i havn't found a way to separate the logs from the different ranks.)
Anyhow, now that I have truncated the names, I have not seen any messages of failed polishing (yet).
Mikael
21 feb 2013 kl. 16:58 skrev Carson Holt <carsonhh at gmail.com>:
> Which version of MAKER are you using? We recently deployed MAKER on the
> large NIH TACC computer cluster with some additional modifications for
> very large MPI jobs (> 1500 cpus). Some of the modifications focus on
> reannotating very large contigs as opposed to the small contig de novo
> annotation that MAKER already works well on. The modifications will be
> merged back into the MAKER downloadable release soon, but I could give you
> access to test with especially if you are running large MPI jobs or on
> very large multi-megabase contigs.
>
> --Carson
>
>
> On 13-02-21 10:39 AM, "Mikael Brandström Durling" <mikael.durling at slu.se>
> wrote:
>
>> Hi Daniel,
>> the genomes I work with are in the order of 30-60 Mb. Other assemblies
>> have been quick jobs for maker without any problems.
>>
>> If I run maker with -debugmpi, I get sets of debug printouts from the
>> different ranks now and then:
>>
>> COMM INITIALIZATION | SEND | who_I_am | 3
>> --> 0 | 46731
>> COMM INITIALIZATION | SEND | what_I_want | 3
>> --> 0 | 46732
>> COMM INITIALIZATION | RECV | what_I_want | 0
>> <-- 3 | 312850
>> COMM HAVE C_RESULT | SEND | c_res_status (no_c_res) | 0
>> --> 3 | 312851
>> HELPER/RESULT REQUESTED | RECV | work_order (num_helpers_req) | 0
>> <-- 3 | 312852
>> COMM HAVE C_RESULT | RECV | c_res_status (is c_res?) | 3
>> <-- 0 | 46733
>> HELPER/RESULT REQUESTED | SEND | work_order (num_helpers_req) | 3
>> --> 0 | 46734
>> HELPER/RESULT REQUESTED | SEND | req_stat (no_helpers_avail) | 0
>> --> 3 | 312853
>> HELPER/RESULT REQUESTED | RECV | req_stat (is helper avail?) | 3
>> <-- 0 | 46735
>> COMM INITIALIZATION | RECV | who_I_am | 0
>> <-- ANY | 312854
>>
>> and then they seem to stay waiting while a single rank continues to run
>> the normal analysis. I have filtered the assembly for contigs shorter
>> than then minimum length set in maker_opts.ctl.
>>
>> I did some strace:ing of the ranks that do nothing, and it seems they
>> loop over running a subprocess that basically does a process listing.
>>
>> I might be completely off in my guesses of what the problem might be. I'm
>> sort of afraid that I'm bitten by some NFS related problem as I have been
>> quite a few times by know. I will soon try to reannotate a genome
>> sequenced by the JGI where we have 35Mb in 15 scaffolds just to make sure
>> that make behaves as expected with that genome.
>> Mikael
>>
>>
>> 20 feb 2013 kl. 17:29 skrev Daniel Ence <dence at genetics.utah.edu>:
>>
>>> Hi Mikael, Depending on the genome size, the assembly you've described
>>> shouldn't be too difficult to work with. The process activity that
>>> you're describing sounds more like a race condition, where one process
>>> is hogging all the work and all the other processes keep trying to find
>>> work, but keep getting in each others' way.
>>>
>>> How much of the genome has maker completed when the processes start
>>> doing this?
>>>
>>> Thanks,
>>> Daniel
>>>
>>> Daniel Ence
>>> Graduate Student
>>> Eccles Institute of Human Genetics
>>> University of Utah
>>> 15 North 2030 East, Room 2100
>>> Salt Lake City, UT 84112-5330
>>> ________________________________________
>>> From: maker-devel-bounces at yandell-lab.org
>>> [maker-devel-bounces at yandell-lab.org] on behalf of Mikael Brandström
>>> Durling [mikael.durling at slu.se]
>>> Sent: Wednesday, February 20, 2013 6:12 AM
>>> To: maker-devel at yandell-lab.org
>>> Subject: [maker-devel] Running MAKER on highly fragmented assembly
>>>
>>> Hi,
>>>
>>> I'm trying to run MAKER on a rather fragmented assembly. I know this is
>>> not optimal, as I will most likely miss a substantial part of the gene
>>> complement due to the fragmentation. Disregarding this, my question is
>>> if there are other problems with running maker on these kinds of genomes
>>> with roughly 1500 scaffolds and an N50 of 60 kb? I find that maker, run
>>> with MPI (mpich2) behaves rather in a rather strange way, with basically
>>> one of the ranks staying at 100% cpu, and the others lingering at about
>>> 0%. Now and then I see a burst of activity in the other ranks before
>>> they get back to low activity. Could this be a result of the
>>> fragmentation level, or should I look for other problems? (Like the all
>>> to common problems of running over NFS with locking etc).
>>>
>>> cheers,
>>> Mikael
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
>
More information about the maker-devel
mailing list