Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem
From: Blosch, Edwin L (edwin.l.blosch_at_[hidden])
Date: 2013-06-12 15:00:08


The version of mxm is reports as: Version : 1.5.dc8c171
The version of OFED reports as: MLNX_OFED_LINUX-2.0-2.0.5:

Here are some revised scaling numbers after configuring OpenMPI to use MXM. I'm not sure if I posted medium or small case last time, but this is the "small" case. By the time you get out to 800 cores, each process talks to between 10 to 16 other processes (this is a physical domain decomposition), and the message sizes can be described by saying there is a distribution from 1K bytes up to 10K bytes (25%), 3 times larger (50%), and 3 times smaller (25%). On the "medium" case, the difference between OpenMPI and MVAPICH is smaller, but OpenMPI is still doing better.

Scalability - 1 domain per process
   MPI # cores Ave. Rate Std. Dev. % # timings Speedup Efficiency
================================================================================================
MVAPICH | 16 | 7.5822 | 0.171 % | 3 | 16.000 | 1.0000
MVAPICH | 48 | 7.7416 | 0.804 % | 3 | 47.011 | 0.9794
MVAPICH | 80 | 7.6365 | 0.252 % | 3 | 79.431 | 0.9929
MVAPICH | 160 | 7.4802 | 0.887 % | 3 | 162.182 | 1.0136
MVAPICH | 256 | 7.7930 | 1.554 % | 3 | 249.073 | 0.9729
MVAPICH | 320 | 7.7346 | 0.423 % | 3 | 313.695 | 0.9803
MVAPICH | 480 | 7.9225 | 2.594 % | 3 | 459.378 | 0.9570
MVAPICH | 640 | 8.3111 | 2.416 % | 3 | 583.866 | 0.9123
MVAPICH | 800 | 8.9315 | 5.059 % | 3 | 679.137 | 0.8489
OpenMPI | 16 | 7.5919 | 0.879 % | 3 | 16.000 | 1.0000
OpenMPI | 48 | 7.7469 | 0.478 % | 3 | 47.040 | 0.9800
OpenMPI | 80 | 7.6654 | 0.544 % | 3 | 79.233 | 0.9904
OpenMPI | 160 | 7.7252 | 2.202 % | 3 | 157.239 | 0.9827
OpenMPI | 256 | 7.7043 | 0.563 % | 3 | 252.265 | 0.9854
OpenMPI | 320 | 7.6727 | 6.086 % | 3 | 316.629 | 0.9895
OpenMPI | 480 | 7.7016 | 0.450 % | 3 | 473.163 | 0.9858
OpenMPI | 640 | 8.0357 | 0.572 % | 3 | 604.651 | 0.9448
OpenMPI | 800 | 8.4328 | 3.198 % | 3 | 720.223 | 0.9003

From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Mike Dubman
Sent: Wednesday, June 12, 2013 7:01 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

Also, what ofed version (ofed_info -s) and mxm version (rpm -qi mxm) do you use?

On Wed, Jun 12, 2013 at 3:30 AM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:
Great! Would you mind showing the revised table? I'm curious as to the relative performance.

On Jun 11, 2013, at 4:53 PM, eblosch_at_[hidden]<mailto:eblosch_at_[hidden]> wrote:

> Problem solved. I did not configure with --with-mxm=/opt/mellanox/mcm and
> this location was not auto-detected. Once I rebuilt with this option,
> everything worked fine. Scaled better than MVAPICH out to 800. MVAPICH
> configure log showed that it had found this component of the OFED stack.
>
> Ed
>
>
>> If you run at 224 and things look okay, then I would suspect something in
>> the upper level switch that spans cabinets. At that point, I'd have to
>> leave it to Mellanox to advise.
>>
>>
>> On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" <edwin.l.blosch_at_[hidden]<mailto:edwin.l.blosch_at_[hidden]>>
>> wrote:
>>
>>> I tried adding "-mca btl openib,sm,self" but it did not make any
>>> difference.
>>>
>>> Jesus' e-mail this morning has got me thinking. In our system, each
>>> cabinet has 224 cores, and we are reaching a different level of the
>>> system architecture when we go beyond 224. I got an additional data
>>> point at 256 and found that performance is already falling off. Perhaps
>>> I did not build OpenMPI properly to support the Mellanox adapters that
>>> are used in the backplane, or I need some configuration setting similar
>>> to FAQ #19 in the Tuning/Openfabrics section.
>>>
>>> From: users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]> [mailto:users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]>] On
>>> Behalf Of Ralph Castain
>>> Sent: Sunday, June 09, 2013 6:48 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
>>> problem
>>>
>>> Strange - it looks like a classic oversubscription behavior. Another
>>> possibility is that it isn't using IB for some reason when extended to
>>> the other nodes. What does your cmd line look like? Have you tried
>>> adding "-mca btl openib,sm,self" just to ensure it doesn't use TCP for
>>> some reason?
>>>
>>>
>>> On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" <edwin.l.blosch_at_[hidden]<mailto:edwin.l.blosch_at_[hidden]>>
>>> wrote:
>>>
>>>
>>> Correct. 20 nodes, 8 cores per dual-socket on each node = 360.
>>>
>>> From: users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]> [mailto:users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]>] On
>>> Behalf Of Ralph Castain
>>> Sent: Sunday, June 09, 2013 6:18 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
>>> problem
>>>
>>> So, just to be sure - when you run 320 "cores", you are running across
>>> 20 nodes?
>>>
>>> Just want to ensure we are using "core" the same way - some people
>>> confuse cores with hyperthreads.
>>>
>>> On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" <edwin.l.blosch_at_[hidden]<mailto:edwin.l.blosch_at_[hidden]>>
>>> wrote:
>>>
>>>
>>>
>>> 16. dual-socket Xeon, E5-2670.
>>>
>>> I am trying a larger model to see if the performance drop-off happens at
>>> a different number of cores.
>>> Also I'm running some intermediate core-count sizes to refine the curve
>>> a bit.
>>> I also added mpi_show_mca_params all, and at the same time,
>>> btl_openib_use_eager_rdma 1, just to see if that does anything.
>>>
>>> From: users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]> [mailto:users-bounces_at_[hidden]<mailto:users-bounces_at_[hidden]>] On
>>> Behalf Of Ralph Castain
>>> Sent: Sunday, June 09, 2013 5:04 PM
>>> To: Open MPI Users
>>> Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem
>>>
>>> Looks to me like things are okay thru 160, and then things fall apart
>>> after that point. How many cores are on a node?
>>>
>>>
>>> On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" <edwin.l.blosch_at_[hidden]<mailto:edwin.l.blosch_at_[hidden]>>
>>> wrote:
>>>
>>>
>>>
>>>
>>> I'm having some trouble getting good scaling with OpenMPI 1.6.4 and I
>>> don't know where to start looking. This is an Infiniband FDR network
>>> with Sandy Bridge nodes. I am using affinity (--bind-to-core) but no
>>> other options. As the number of cores goes up, the message sizes are
>>> typically going down. There seem to be lots of options in the FAQ, and I
>>> would welcome any advice on where to start. All these timings are on a
>>> completely empty system except for me.
>>>
>>> Thanks
>>>
>>>
>>> MPI # cores Ave. Rate Std. Dev. % # timings
>>> Speedup Efficiency
>>> ================================================================================================
>>> MVAPICH | 16 | 8.6783 | 0.995 % | 2 |
>>> 16.000 | 1.0000
>>> MVAPICH | 48 | 8.7665 | 1.937 % | 3 |
>>> 47.517 | 0.9899
>>> MVAPICH | 80 | 8.8900 | 2.291 % | 3 |
>>> 78.095 | 0.9762
>>> MVAPICH | 160 | 8.9897 | 2.409 % | 3 |
>>> 154.457 | 0.9654
>>> MVAPICH | 320 | 8.9780 | 2.801 % | 3 |
>>> 309.317 | 0.9666
>>> MVAPICH | 480 | 8.9704 | 2.316 % | 3 |
>>> 464.366 | 0.9674
>>> MVAPICH | 640 | 9.0792 | 1.138 % | 3 |
>>> 611.739 | 0.9558
>>> MVAPICH | 720 | 9.1328 | 1.052 % | 3 |
>>> 684.162 | 0.9502
>>> MVAPICH | 800 | 9.1945 | 0.773 % | 3 |
>>> 755.079 | 0.9438
>>> OpenMPI | 16 | 8.6743 | 2.335 % | 2 |
>>> 16.000 | 1.0000
>>> OpenMPI | 48 | 8.7826 | 1.605 % | 2 |
>>> 47.408 | 0.9877
>>> OpenMPI | 80 | 8.8861 | 0.120 % | 2 |
>>> 78.093 | 0.9762
>>> OpenMPI | 160 | 8.9774 | 0.785 % | 2 |
>>> 154.598 | 0.9662
>>> OpenMPI | 320 | 12.0585 | 16.950 % | 2 |
>>> 230.191 | 0.7193
>>> OpenMPI | 480 | 14.8330 | 1.300 % | 2 |
>>> 280.701 | 0.5848
>>> OpenMPI | 640 | 17.1723 | 2.577 % | 3 |
>>> 323.283 | 0.5051
>>> OpenMPI | 720 | 18.2153 | 2.798 % | 3 |
>>> 342.868 | 0.4762
>>> OpenMPI | 800 | 19.3603 | 2.254 % | 3 |
>>> 358.434 | 0.4480
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]<mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]<mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]<mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]<mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]<mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]<mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/users