[maker-devel] maker MPI problem

Carson Holt carsonhh at gmail.com
Tue Aug 15 15:13:04 MDT 2017


Some notes:

First, the mpiexec command still needs the --mca parameters (either  '--mca btl ^openib' or '--mca btl vader,tcp,self --mca btl_tcp_if_include ib0’). Otherwise if you have infiniband on the nodes it will try and use OpenFabrics compatible libraries which will kill code doing system calls (like MAKER does).

Second, try using a higher count than 2 in your batch. One process is always sacrificed by maker to act only for message management among processes, so with -n 2, you have one process working and one managing data. So only one contig will run at a time. If you set it to a higher number the issue will go away. The message manger process starts to get saturated at ~200 CPUs, so anything above that processor count becomes less beneficial to the job.

Thanks,
Carson




> On Aug 15, 2017, at 3:05 PM, zl c <chzelin at gmail.com> wrote:
> 
> I submit a job:
> sbatch --gres=lscratch:100 --time=8:00:00 --mem-per-cpu=8g -N 1-1 --ntasks=2 --ntasks-per-core=1 --job-name run06.mpi -o log/run06.mpi.o%A run06.maker.mpi.sh <http://run06.maker.mpi.sh/>
> 
> CMD in run06.maker.mpi.sh <http://run06.maker.mpi.sh/>
> mpiexec -n $SLURM_NTASKS maker -c 1 -base genome -g genome.fasta
> 
> Another question:
> How much temporary space and memory should I use for a ~10Mb sequences and large database like nr and uniref90.
> 
> Thanks,
> zelin
> 
> --------------------------------------------
> Zelin Chen [chzelin at gmail.com <mailto:chzelin at gmail.com>]
> 
> 
> On Tue, Aug 15, 2017 at 4:50 PM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
> What is your command line? Are you running interactively or as a submitted batch? If it's a batch job what options did you give it?
> 
> --Carson
> 
> Sent from my iPhone
> 
> On Aug 15, 2017, at 2:47 PM, zl c <chzelin at gmail.com <mailto:chzelin at gmail.com>> wrote:
> 
>> Hi Carson,  Christopher, Daniel,
>> 
>> Thank you for your kind help.
>> 
>> Now it works without any other options on one nodes and 4 CPUs. I set the number of task to 2, but there's only one contigs in running. Should it be two contigs running at the same time?
>> 
>> Zelin
>> 
>> --------------------------------------------
>> Zelin Chen [chzelin at gmail.com <mailto:chzelin at gmail.com>]
>> 
>> 
>> NIH/NHGRI
>> Building 50, Room 5531
>> 50 SOUTH DR, MSC 8004 
>> BETHESDA, MD 20892-8004
>> 
>> On Tue, Aug 15, 2017 at 11:47 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>> Did it die or did you just get a warning?
>> 
>> Here is a list of flags to add that suppress warnings and other issues with OpenMPI. You can add them all or one at a time depending on issues you get.
>> 
>> #add if MPI not using all CPU given
>> --oversubscribe --bind-to none
>> 
>> #workaround for infinaband (use instead of --mca ^openib)
>> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0
>> 
>> #add to stop certain other warnings
>> --mca orte_base_help_aggregate 0
>> 
>> #stop fork warnings
>> --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0
>> 
>> —Carson
>> 
>> 
>> 
>>> On Aug 15, 2017, at 9:34 AM, zl c <chzelin at gmail.com <mailto:chzelin at gmail.com>> wrote:
>>> 
>>> Here are some latest message:
>>> 
>>> [cn3360:57176] 1 more process has sent help message help-opal-runtime.txt / opal_init:warn-fork
>>> [cn3360:57176] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>>> 
>>> --------------------------------------------
>>> Zelin Chen [chzelin at gmail.com <mailto:chzelin at gmail.com>]
>>> 
>>> 
>>> 
>>> On Tue, Aug 15, 2017 at 10:39 AM, Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>> wrote:
>>> You may need to delete the .../maker/perl directory before doing the reinstall if not doing a brand new installation. Otherwise you can ignore the subroutine redefined warnings during compile.
>>> 
>>> Have you been able to test the alternate flags on the command line for MPI? How about an alternate perl without threads?
>>> 
>>> --Carson
>>> 
>>> Sent from my iPhone
>>> 
>>> On Aug 15, 2017, at 8:27 AM, zl c <chzelin at gmail.com <mailto:chzelin at gmail.com>> wrote:
>>> 
>>>> When I installed by './Build install', I got following some messages:
>>>> Configuring MAKER with MPI support
>>>> Installing MAKER...
>>>> Configuring MAKER with MPI support
>>>> Subroutine dl_load_flags redefined at (eval 125) line 8.
>>>> Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9.
>>>> Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9.
>>>> 
>>>> I'm not sure whether it's correctly installed.
>>>> 
>>>> Thanks,
>>>> 
>>>> --------------------------------------------
>>>> Zelin Chen [chzelin at gmail.com <mailto:chzelin at gmail.com>]
>>>> 
>>>> NIH/NHGRI
>>>> Building 50, Room 5531
>>>> 50 SOUTH DR, MSC 8004 
>>>> BETHESDA, MD 20892-8004
>>>> 
>>>> On Mon, Aug 14, 2017 at 9:23 PM, Fields, Christopher J <cjfields at illinois.edu <mailto:cjfields at illinois.edu>> wrote:
>>>> Carson,
>>>> 
>>>>  
>>>> 
>>>> It was attached to the initial message (named ‘run05.mpi.o47346077’).  It looks like a Perl issue with threads, though I don’t see why this would crash a cluster.  The fact there is a log file would suggest it just ended the job.
>>>> 
>>>>  
>>>> 
>>>> chris
>>>> 
>>>>  
>>>> 
>>>> From: maker-devel <maker-devel-bounces at yandell-lab.org <mailto:maker-devel-bounces at yandell-lab.org>> on behalf of Carson Holt <carsonhh at gmail.com <mailto:carsonhh at gmail.com>>
>>>> Date: Monday, August 14, 2017 at 2:18 PM
>>>> To: zl c <chzelin at gmail.com <mailto:chzelin at gmail.com>>
>>>> Cc: "maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>" <maker-devel at yandell-lab.org <mailto:maker-devel at yandell-lab.org>>
>>>> Subject: Re: [maker-devel] maker MPI problem
>>>> 
>>>>  
>>>> 
>>>> This is rather vague —> “crashed the computer cluster”
>>>> 
>>>>  
>>>> 
>>>> Do you have a specific error?
>>>> 
>>>>  
>>>> 
>>>> —Carson
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> On Aug 14, 2017, at 12:59 PM, zl c <chzelin at gmail.com <mailto:chzelin at gmail.com>> wrote:
>>>> 
>>>>  
>>>> 
>>>> Hello,
>>>> 
>>>>  
>>>> 
>>>> I ran maker 3.0 with openmpi 2.0.2 and it crashed the computer cluster. I attached the log file. Could you help me to solve the problem?
>>>> 
>>>>  
>>>> 
>>>> CMD:
>>>> 
>>>> export LD_PRELOAD=/usr/local/OpenMPI/2.0.2/gcc-6.3.0/lib/libmpi.so
>>>> 
>>>> export OMPI_MCA_mpi_warn_on_fork=0
>>>> 
>>>> mpiexec -mca btl ^openib -n $SLURM_NTASKS maker -c 1 –base genome  -g genome.fasta
>>>> 
>>>>  
>>>> 
>>>> Thanks,
>>>> 
>>>> Zelin Chen
>>>> 
>>>>  
>>>> 
>>>> --------------------------------------------
>>>> 
>>>> Zelin Chen [chzelin at gmail.com <mailto:chzelin at gmail.com>]  Ph.D.
>>>> 
>>>>  
>>>> 
>>>> NIH/NHGRI
>>>> 
>>>> Building 50, Room 5531
>>>> 50 SOUTH DR, MSC 8004 
>>>> BETHESDA, MD 20892-8004
>>>> 
>>>> <run05.mpi.o47346077>_______________________________________________
>>>> maker-devel mailing list
>>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>>>>  
>>>> 
>>>> 
>>> 
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170815/05db648d/attachment-0003.html>


More information about the maker-devel mailing list