Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] poor btl sm latency
From: Matthias Jurenz (matthias.jurenz_at_[hidden])
Date: 2012-02-14 07:20:59


I've built Open MPI 1.5.5rc1 (tarball from Web) with CFLAGS=-O3.
Unfortunately, also without any effect.

Here some results with enabled binding reports:

$ mpirun *--bind-to-core* --report-bindings -np 2 ./all2all_ompi1.5.5
[n043:61313] [[56788,0],0] odls:default:fork binding child [[56788,1],1] to
cpus 0002
[n043:61313] [[56788,0],0] odls:default:fork binding child [[56788,1],0] to
cpus 0001
latency: 1.415us

$ mpirun *-mca maffinity hwloc --bind-to-core* --report-bindings -np 2
./all2all_ompi1.5.5
[n043:61469] [[49736,0],0] odls:default:fork binding child [[49736,1],1] to
cpus 0002
[n043:61469] [[49736,0],0] odls:default:fork binding child [[49736,1],0] to
cpus 0001
latency: 1.4us

$ mpirun *-mca maffinity first_use --bind-to-core* --report-bindings -np 2
./all2all_ompi1.5.5
[n043:61508] [[49681,0],0] odls:default:fork binding child [[49681,1],1] to
cpus 0002
[n043:61508] [[49681,0],0] odls:default:fork binding child [[49681,1],0] to
cpus 0001
latency: 1.4us

$ mpirun *--bind-to-socket* --report-bindings -np 2 ./all2all_ompi1.5.5
[n043:61337] [[56780,0],0] odls:default:fork binding child [[56780,1],1] to
socket 0 cpus 0001
[n043:61337] [[56780,0],0] odls:default:fork binding child [[56780,1],0] to
socket 0 cpus 0001
latency: 4.0us

$ mpirun *-mca maffinity hwloc --bind-to-socket* --report-bindings -np 2
./all2all_ompi1.5.5
[n043:61615] [[49914,0],0] odls:default:fork binding child [[49914,1],1] to
socket 0 cpus 0001
[n043:61615] [[49914,0],0] odls:default:fork binding child [[49914,1],0] to
socket 0 cpus 0001
latency: 4.0us

$ mpirun *-mca maffinity first_use --bind-to-socket* --report-bindings -np 2
./all2all_ompi1.5.5
[n043:61639] [[49810,0],0] odls:default:fork binding child [[49810,1],1] to
socket 0 cpus 0001
[n043:61639] [[49810,0],0] odls:default:fork binding child [[49810,1],0] to
socket 0 cpus 0001
latency: 4.0us

If socket-binding is enabled it seems that all ranks are bind to the very first
core of one and the same socket. Is it intended? I expected that each rank
gets its own socket (i.e. 2 ranks -> 2 sockets)...

Matthias

On Monday 13 February 2012 22:36:50 Jeff Squyres wrote:
> Also, double check that you have an optimized build, not a debugging build.
>
> SVN and HG checkouts default to debugging builds, which add in lots of
> latency.
>
> On Feb 13, 2012, at 10:22 AM, Ralph Castain wrote:
> > Few thoughts
> >
> > 1. Bind to socket is broken in 1.5.4 - fixed in next release
> >
> > 2. Add --report-bindings to cmd line and see where it thinks the procs
> > are bound
> >
> > 3. Sounds lime memory may not be local - might be worth checking mem
> > binding.
> >
> > Sent from my iPad
> >
> > On Feb 13, 2012, at 7:07 AM, Matthias Jurenz <matthias.jurenz_at_tu-
dresden.de> wrote:
> >> Hi Sylvain,
> >>
> >> thanks for the quick response!
> >>
> >> Here some results with enabled process binding. I hope I used the
> >> parameters correctly...
> >>
> >> bind two ranks to one socket:
> >> $ mpirun -np 2 --bind-to-core ./all2all
> >> $ mpirun -np 2 -mca mpi_paffinity_alone 1 ./all2all
> >>
> >> bind two ranks to two different sockets:
> >> $ mpirun -np 2 --bind-to-socket ./all2all
> >>
> >> All three runs resulted in similar bad latencies (~1.4us).
> >>
> >> :-(
> >>
> >> Matthias
> >>
> >> On Monday 13 February 2012 12:43:22 sylvain.jeaugey_at_[hidden] wrote:
> >>> Hi Matthias,
> >>>
> >>> You might want to play with process binding to see if your problem is
> >>> related to bad memory affinity.
> >>>
> >>> Try to launch pingpong on two CPUs of the same socket, then on
> >>> different sockets (i.e. bind each process to a core, and try different
> >>> configurations).
> >>>
> >>> Sylvain
> >>>
> >>>
> >>>
> >>> De : Matthias Jurenz <matthias.jurenz_at_[hidden]>
> >>> A : Open MPI Developers <devel_at_[hidden]>
> >>> Date : 13/02/2012 12:12
> >>> Objet : [OMPI devel] poor btl sm latency
> >>> Envoyé par : devel-bounces_at_[hidden]
> >>>
> >>>
> >>>
> >>> Hello all,
> >>>
> >>> on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad
> >>> latencies
> >>> (~1.5us) when performing 0-byte p2p communication on one single node
> >>> using the
> >>> Open MPI sm BTL. When using Platform MPI we get ~0.5us latencies which
> >>> is pretty good. The bandwidth results are similar for both MPI
> >>> implementations
> >>> (~3,3GB/s) - this is okay.
> >>>
> >>> One node has 64 cores and 64Gb RAM where it doesn't matter how many
> >>> ranks allocated by the application. We get similar results with
> >>> different number of
> >>> ranks.
> >>>
> >>> We are using Open MPI 1.5.4 which is built by gcc 4.3.4 without any
> >>> special
> >>> configure options except the installation prefix and the location of
> >>> the LSF
> >>> stuff.
> >>>
> >>> As mentioned at http://www.open-mpi.org/faq/?category=sm we tried to
> >>> use /dev/shm instead of /tmp for the session directory, but it had no
> >>> effect. Furthermore, we tried the current release candidate 1.5.5rc1
> >>> of Open MPI which
> >>> provides an option to use the SysV shared memory (-mca shmem sysv) -
> >>> also this
> >>> results in similar poor latencies.
> >>>
> >>> Do you have any idea? Please help!
> >>>
> >>> Thanks,
> >>> Matthias
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel