Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slowdown with infiniband and latest CentOS kernel
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-12-17 11:04:16


Are you binding the procs? We don't bind by default (this will change in 1.7.4), and binding can play a significant role when comparing across kernels.

add "--bind-to-core" to your cmd line

On Dec 17, 2013, at 7:09 AM, Noam Bernstein <noam.bernstein_at_[hidden]> wrote:

> On Dec 16, 2013, at 5:40 PM, Noam Bernstein <noam.bernstein_at_[hidden]> wrote:
>
>>
>> Once I have some more detailed information I'll follow up.
>
> OK - I've tried to characterize the behavior with vasp, which accounts for
> most of our cluster usage, and it's quite odd. I ran my favorite benchmarking
> job repeated 4 times. As you can see below, in some
> cases using sm it's as fast as before (kernel 2.6.32-358.23.2.el6.x86_64),
> but mostly it's a factor of 2 slower. With openib and our older nodes it's always a
> factor of 2-4 slower. With the newer nodes in a situation where using sm is
> possible it's occasionally as fast as before, but sometimes it's 10-20 times
> slower. When using ib with the new nodes it's always much slower than before.
>
> openmpi is 1.7.3, recompiled with the new kernel. vasp is 5.3.3, which we've
> been using for months. Everything is compiled with an older stable version
> of the intel compiler, as we've been doing for a long time.
>
> More perhaps useful information - I don't have actual data from the previous
> setup (perhaps I should roll back some nodes and check), but I generally
> expect to see 100% cpu usage on all the processes, either because they're
> doing numeric stuff, or doing a busy-wait for mpi. However, now I see a few
> of the vasp processes at 100%, and the others at 50-70% (say 4-6 on a given
> node at 100%, and the rest lower).
>
> If anyone has any ideas on what's going on, or how to debug further, I'd
> really appreciate some suggestions.
>
> Noam
>
> 8 core nodes (dual Xeon X5550)
>
> 8 MPI procs (single node)
> used to be 5.74 s
> now:
> btl: default or sm only or sm+openib: 5.5-9.3 s, mostly the larger times
> btl: openib: 10.0-12.2 s
>
> 16 MPI procs (2 nodes)
> used to be 2.88 s
> btl default or openib or sm+openib: 4.8 - 6.23 s
>
> 32 MPI procs (4 nodes)
> use to be 1.59 s
> btl default or openib or sm+openib: 2.73-4.49 s, but sometimes just fails
>
> at least once gave the errors (stack trace is incomplete, but probably on mpi_comm_rank, mpi_comm_size, or mpi_barrier)
> [compute-3-24:32566] [[59587,0],0]:route_callback trying to get message from [[59587,1],20] to [[59587,1],28]:102, routing loop
> [0] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_backtrace_print+0x1f) [0x2b5940c2dd9f]
> [1] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_rml_oob.so(+0x22b6) [0x2b5941f0f2b6]
> [2] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_recv_complete+0x27f) [0x2b594333341f]
> [3] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/openmpi/mca_oob_tcp.so(+0x9d3a) [0x2b5943334d3a]
> [4] func:/usr/local/openmpi/1.7.3/x86_64/ib/gnu/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x8bc) [0x2b5940c3592c]
> [5] func:mpirun(orterun+0xe25) [0x404565]
> [6] func:mpirun(main+0x20) [0x403594]
> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3091c1ed1d]
> [8] func:mpirun() [0x4034b9]
>
>
> 16 core nodes (dual Xeon E5-2670)
>
> 8 MPI procs (single node)
> not sure what it used to be, but 3.3 s is plausible
> btl: default or sm or openib+sm: 3.3-3.4 s
> btl: openib 3.9-4.14 s
>
> 16 MPI procs (single node)
> used to be 2.07 s
> btl default or openib: 23.0-32.56 s
> btl sm or sm+openib: 1.94 s - 39.27 s (mostly the slower times)
>
> 32 MPI procs (2 nodes)
> used to be 1.24 s
> btl default or sm or openib or sm+openib: 30s - 97 s_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users