[maker-devel] maker MPI problem
zl c
chzelin at gmail.com
Tue Aug 15 15:17:20 MDT 2017
This is a test run, so I use only 2 tasks. I'll try more tasks and your
options.
Thanks,
Zelin
On Tue, Aug 15, 2017 at 5:13 PM, Carson Holt <carsonhh at gmail.com> wrote:
> Some notes:
>
> First, the mpiexec command still needs the --mca parameters (either
> '--mca btl ^openib' or '--mca btl vader,tcp,self --mca btl_tcp_if_include
> ib0’). Otherwise if you have infiniband on the nodes it will try and use
> OpenFabrics compatible libraries which will kill code doing system calls
> (like MAKER does).
>
> Second, try using a higher count than 2 in your batch. One process is
> always sacrificed by maker to act only for message management among
> processes, so with -n 2, you have one process working and one managing
> data. So only one contig will run at a time. If you set it to a higher
> number the issue will go away. The message manger process starts to get
> saturated at ~200 CPUs, so anything above that processor count becomes less
> beneficial to the job.
>
> Thanks,
> Carson
>
>
>
>
> On Aug 15, 2017, at 3:05 PM, zl c <chzelin at gmail.com> wrote:
>
> I submit a job:
> sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1
> --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A
> run06.maker.mpi.sh
>
> CMD in run06.maker.mpi.sh
> mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta
>
> Another question:
> How much temporary space and memory should I use for a ~10Mb sequences and
> large database like nr and uniref90.
>
> Thanks,
> zelin
>
> --------------------------------------------
> Zelin Chen [chzelin at gmail.com]
>
>
> On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>
>> What is your command line? Are you running interactively or as a
>> submitted batch? If it's a batch job what options did you give it?
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Aug 15, 2017, at 2:47 PM, zl c <chzelin at gmail.com> wrote:
>>
>> Hi Carson, Christopher, Daniel,
>>
>> Thank you for your kind help.
>>
>> Now it works without any other options on one nodes and 4 CPUs. I set
>> the number of task to 2, but there's only one contigs in running. Should it
>> be two contigs running at the same time?
>>
>> Zelin
>>
>> --------------------------------------------
>> Zelin Chen [chzelin at gmail.com]
>>
>>
>> NIH/NHGRI
>> Building 50, Room 5531
>> 50 SOUTH DR, MSC 8004
>> BETHESDA, MD 20892-8004
>>
>> On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <carsonhh at gmail.com> wrote:
>>
>>> Did it die or did you just get a warning?
>>>
>>> Here is a list of flags to add that suppress warnings and other issues
>>> with OpenMPI. You can add them all or one at a time depending on issues you
>>> get.
>>>
>>> #add if MPI not using all CPU given
>>> --oversubscribe --bind-to none
>>>
>>> #workaround for infinaband (use instead of --mca ^openib)
>>> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>>>
>>> #add to stop certain other warnings
>>> --mca orte_base_help_aggregate 0
>>>
>>> #stop fork warnings
>>> --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0
>>>
>>> —Carson
>>>
>>>
>>>
>>> On Aug 15, 2017, at 9:34 AM, zl c <chzelin at gmail.com> wrote:
>>>
>>> Here are some latest message:
>>>
>>> [cn3360:57176] 1 more process has sent help message
>>> help-opal-runtime.txt / opal_init:warn-fork
>>> [cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>>> all help / error messages
>>>
>>> --------------------------------------------
>>> Zelin Chen [chzelin at gmail.com]
>>>
>>>
>>>
>>> On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <carsonhh at gmail.com>
>>> wrote:
>>>
>>>> You may need to delete the .../maker/perl directory before doing the
>>>> reinstall if not doing a brand new installation. Otherwise you can ignore
>>>> the subroutine redefined warnings during compile.
>>>>
>>>> Have you been able to test the alternate flags on the command line for
>>>> MPI? How about an alternate perl without threads?
>>>>
>>>> --Carson
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Aug 15, 2017, at 8:27 AM, zl c <chzelin at gmail.com> wrote:
>>>>
>>>> When I installed by './Build install', I got following some messages:
>>>> Configuring MAKER with MPI support
>>>> Installing MAKER...
>>>> Configuring MAKER with MPI support
>>>> Subroutine dl_load_flags redefined at (eval 125) line 8.
>>>> Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at
>>>> (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at
>>>> (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at
>>>> (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval
>>>> 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at
>>>> (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at
>>>> (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at
>>>> (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval
>>>> 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval
>>>> 125) line 9.
>>>> Subroutine Parallel::Application::MPI::_comment redefined at (eval
>>>> 125) line 9.
>>>>
>>>> I'm not sure whether it's correctly installed.
>>>>
>>>> Thanks,
>>>>
>>>> --------------------------------------------
>>>> Zelin Chen [chzelin at gmail.com]
>>>>
>>>> NIH/NHGRI
>>>> Building 50, Room 5531
>>>> 50 SOUTH DR, MSC 8004
>>>> BETHESDA, MD 20892-8004
>>>>
>>>> On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <
>>>> cjfields at illinois.edu> wrote:
>>>>
>>>>> Carson,
>>>>>
>>>>>
>>>>>
>>>>> It was attached to the initial message (named ‘run05.mpi.o47346077’).
>>>>> It looks like a Perl issue with threads, though I don’t see why this would
>>>>> crash a cluster. The fact there is a log file would suggest it just ended
>>>>> the job.
>>>>>
>>>>>
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>>
>>>>> *From: *maker-devel <maker-devel-bounces at yandell-lab.org> on behalf
>>>>> of Carson Holt <carsonhh at gmail.com>
>>>>> *Date: *Monday, August 14, 2017 at 2:18 PM
>>>>> *To: *zl c <chzelin at gmail.com>
>>>>> *Cc: *"maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>> *Subject: *Re: [maker-devel] maker MPI problem
>>>>>
>>>>>
>>>>>
>>>>> This is rather vague —> “crashed the computer cluster”
>>>>>
>>>>>
>>>>>
>>>>> Do you have a specific error?
>>>>>
>>>>>
>>>>>
>>>>> —Carson
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Aug 14, 2017, at 12:59 PM, zl c <chzelin at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>>
>>>>> I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer
>>>>> cluster. I attached the log file. Could you help me to solve the problem?
>>>>>
>>>>>
>>>>>
>>>>> CMD:
>>>>>
>>>>> export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so
>>>>>
>>>>> export OMPI_MCA_mpi_warn_on_fork=0
>>>>>
>>>>> mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome -g
>>>>> genome.fasta
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Zelin Chen
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------------------
>>>>>
>>>>> Zelin Chen [chzelin at gmail.com] Ph.D.
>>>>>
>>>>>
>>>>>
>>>>> NIH/NHGRI
>>>>>
>>>>> Building 50, Room 5531
>>>>> 50 SOUTH DR, MSC 8004
>>>>> BETHESDA, MD 20892-8004
>>>>>
>>>>> <run05.mpi.o47346077>_______________________________________________
>>>>> maker-devel mailing list
>>>>> maker-devel at box290.bluehost.com
>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yand
>>>>> ell-lab.org
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170815/2a618ebe/attachment-0003.html>
More information about the maker-devel
mailing list