Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SM btl slows down bandwidth?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-13 08:13:10


FWIW, we have made some improvements to shared memory performance in
the upcoming v1.3 series. I won't ask you to test a v1.3 tarball
right now because there's a gnarly bug in the shared memory support
that George is working to fix -- hopefully he'll fix it soon and you
can see if the performance is a bit better in v1.3.

On Aug 13, 2008, at 3:52 AM, Lenny Verkhovsky wrote:

> Hi,
>
> just for the try - can run np 2
>
> ( Ping Pong test is for 2 processes only )
>
>
> On 8/13/08, Daniël Mantione <daniel.mantione_at_[hidden]> wrote:
>
> On Tue, 12 Aug 2008, Gus Correa wrote:
>
> > Hello Daniel and list
> >
> > Could it be a problem with memory bandwidth / contention in multi-
> core?
>
>
> Yes, I believe we are somehow limited by memory performance. Here are
> some numbers from a dual Opteron 2352 system, which has much more
> memory
> bandwidth:
>
>
> #---------------------------------------------------
> # Benchmarking PingPong
> # #processes = 2
> # ( 6 additional processes waiting in MPI_Barrier)
> #---------------------------------------------------
> #bytes #repetitions t[usec] Mbytes/sec
>
> 0 1000 0.86 0.00
> 1 1000 0.97 0.98
> 2 1000 0.95 2.01
> 4 1000 0.96 3.97
> 8 1000 0.95 7.99
> 16 1000 0.96 15.85
> 32 1000 0.99 30.69
> 64 1000 0.97 63.09
> 128 1000 1.02 119.68
> 256 1000 1.18 207.25
> 512 1000 1.40 348.77
> 1024 1000 1.75 556.75
> 2048 1000 2.59 753.22
> 4096 1000 5.10 766.23
> 8192 1000 7.93 985.13
> 16384 1000 14.60 1070.57
> 32768 1000 27.92 1119.23
> 65536 640 46.67 1339.16
> 131072 320 86.03 1453.06
> 262144 160 163.16 1532.21
> 524288 80 310.01 1612.88
> 1048576 40 730.62 1368.69
> 2097152 20 1449.72 1379.57
> 4194304 10 2884.90 1386.53
>
> However, +/- 1200 MB/s (or +/ 1500 MB/s in case of the AMD system)
> is not
> even close to the memory performance limits the systems, so there
> should be room for optimization.
>
> After all, the openib btl manages to tranfer the data from the
> memory of
> oneprocess to the memory of another process just fine with more
> performance.
>
>
> > It has been reported in many mailing lists (mpich, beowulf, etc).
> > Here it seems to happen in dual-processor dual-core with our
> memory intensive
> > programs.
>
>
> MPICH2 manages to get about 5GB/s in shared memory performance on the
> Xeon 5420 system.
>
>
> > Have you checked what happens to the shared memory runs as you
> > you increase the number of active cores/processes?
> > Would it help to set the processor affinity in the shared memory
> runs?
> >
> > http://www.open-mpi.org/faq/?category=building#build-paffinity
> > http://www.open-mpi.org/faq/?category=tuning#using-paffinity
>
>
> Neither has any effect on the scores.
>
>
> Daniël
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems