Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] poor btl sm latency
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-02-15 13:29:38


Something is definitely wrong -- 1.4us is way too high for a 0 or 1 byte HRT ping pong. What is this all2all benchmark, btw? Is it measuring an MPI_ALLTOALL, or a pingpong?

FWIW, on an older Nehalem machine running NetPIPE/MPI, I'm getting about .27us latencies for short messages over sm and binding to socket.

On Feb 14, 2012, at 7:20 AM, Matthias Jurenz wrote:

> I've built Open MPI 1.5.5rc1 (tarball from Web) with CFLAGS=-O3.
> Unfortunately, also without any effect.
>
> Here some results with enabled binding reports:
>
> $ mpirun *--bind-to-core* --report-bindings -np 2 ./all2all_ompi1.5.5
> [n043:61313] [[56788,0],0] odls:default:fork binding child [[56788,1],1] to
> cpus 0002
> [n043:61313] [[56788,0],0] odls:default:fork binding child [[56788,1],0] to
> cpus 0001
> latency: 1.415us
>
> $ mpirun *-mca maffinity hwloc --bind-to-core* --report-bindings -np 2
> ./all2all_ompi1.5.5
> [n043:61469] [[49736,0],0] odls:default:fork binding child [[49736,1],1] to
> cpus 0002
> [n043:61469] [[49736,0],0] odls:default:fork binding child [[49736,1],0] to
> cpus 0001
> latency: 1.4us
>
> $ mpirun *-mca maffinity first_use --bind-to-core* --report-bindings -np 2
> ./all2all_ompi1.5.5
> [n043:61508] [[49681,0],0] odls:default:fork binding child [[49681,1],1] to
> cpus 0002
> [n043:61508] [[49681,0],0] odls:default:fork binding child [[49681,1],0] to
> cpus 0001
> latency: 1.4us
>
>
> $ mpirun *--bind-to-socket* --report-bindings -np 2 ./all2all_ompi1.5.5
> [n043:61337] [[56780,0],0] odls:default:fork binding child [[56780,1],1] to
> socket 0 cpus 0001
> [n043:61337] [[56780,0],0] odls:default:fork binding child [[56780,1],0] to
> socket 0 cpus 0001
> latency: 4.0us
>
> $ mpirun *-mca maffinity hwloc --bind-to-socket* --report-bindings -np 2
> ./all2all_ompi1.5.5
> [n043:61615] [[49914,0],0] odls:default:fork binding child [[49914,1],1] to
> socket 0 cpus 0001
> [n043:61615] [[49914,0],0] odls:default:fork binding child [[49914,1],0] to
> socket 0 cpus 0001
> latency: 4.0us
>
> $ mpirun *-mca maffinity first_use --bind-to-socket* --report-bindings -np 2
> ./all2all_ompi1.5.5
> [n043:61639] [[49810,0],0] odls:default:fork binding child [[49810,1],1] to
> socket 0 cpus 0001
> [n043:61639] [[49810,0],0] odls:default:fork binding child [[49810,1],0] to
> socket 0 cpus 0001
> latency: 4.0us
>
>
> If socket-binding is enabled it seems that all ranks are bind to the very first
> core of one and the same socket. Is it intended? I expected that each rank
> gets its own socket (i.e. 2 ranks -> 2 sockets)...
>
> Matthias
>
> On Monday 13 February 2012 22:36:50 Jeff Squyres wrote:
>> Also, double check that you have an optimized build, not a debugging build.
>>
>> SVN and HG checkouts default to debugging builds, which add in lots of
>> latency.
>>
>> On Feb 13, 2012, at 10:22 AM, Ralph Castain wrote:
>>> Few thoughts
>>>
>>> 1. Bind to socket is broken in 1.5.4 - fixed in next release
>>>
>>> 2. Add --report-bindings to cmd line and see where it thinks the procs
>>> are bound
>>>
>>> 3. Sounds lime memory may not be local - might be worth checking mem
>>> binding.
>>>
>>> Sent from my iPad
>>>
>>> On Feb 13, 2012, at 7:07 AM, Matthias Jurenz <matthias.jurenz_at_tu-
> dresden.de> wrote:
>>>> Hi Sylvain,
>>>>
>>>> thanks for the quick response!
>>>>
>>>> Here some results with enabled process binding. I hope I used the
>>>> parameters correctly...
>>>>
>>>> bind two ranks to one socket:
>>>> $ mpirun -np 2 --bind-to-core ./all2all
>>>> $ mpirun -np 2 -mca mpi_paffinity_alone 1 ./all2all
>>>>
>>>> bind two ranks to two different sockets:
>>>> $ mpirun -np 2 --bind-to-socket ./all2all
>>>>
>>>> All three runs resulted in similar bad latencies (~1.4us).
>>>>
>>>> :-(
>>>>
>>>> Matthias
>>>>
>>>> On Monday 13 February 2012 12:43:22 sylvain.jeaugey_at_[hidden] wrote:
>>>>> Hi Matthias,
>>>>>
>>>>> You might want to play with process binding to see if your problem is
>>>>> related to bad memory affinity.
>>>>>
>>>>> Try to launch pingpong on two CPUs of the same socket, then on
>>>>> different sockets (i.e. bind each process to a core, and try different
>>>>> configurations).
>>>>>
>>>>> Sylvain
>>>>>
>>>>>
>>>>>
>>>>> De : Matthias Jurenz <matthias.jurenz_at_[hidden]>
>>>>> A : Open MPI Developers <devel_at_[hidden]>
>>>>> Date : 13/02/2012 12:12
>>>>> Objet : [OMPI devel] poor btl sm latency
>>>>> Envoyé par : devel-bounces_at_[hidden]
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>
>>>>> on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad
>>>>> latencies
>>>>> (~1.5us) when performing 0-byte p2p communication on one single node
>>>>> using the
>>>>> Open MPI sm BTL. When using Platform MPI we get ~0.5us latencies which
>>>>> is pretty good. The bandwidth results are similar for both MPI
>>>>> implementations
>>>>> (~3,3GB/s) - this is okay.
>>>>>
>>>>> One node has 64 cores and 64Gb RAM where it doesn't matter how many
>>>>> ranks allocated by the application. We get similar results with
>>>>> different number of
>>>>> ranks.
>>>>>
>>>>> We are using Open MPI 1.5.4 which is built by gcc 4.3.4 without any
>>>>> special
>>>>> configure options except the installation prefix and the location of
>>>>> the LSF
>>>>> stuff.
>>>>>
>>>>> As mentioned at http://www.open-mpi.org/faq/?category=sm we tried to
>>>>> use /dev/shm instead of /tmp for the session directory, but it had no
>>>>> effect. Furthermore, we tried the current release candidate 1.5.5rc1
>>>>> of Open MPI which
>>>>> provides an option to use the SysV shared memory (-mca shmem sysv) -
>>>>> also this
>>>>> results in similar poor latencies.
>>>>>
>>>>> Do you have any idea? Please help!
>>>>>
>>>>> Thanks,
>>>>> Matthias
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/