Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] ompi mca mxm version
From: Mike Dubman (mike.ompi_at_[hidden])
Date: 2012-05-11 14:22:52


ob1/openib is RC based which have scalability issues, mxm 1.1 is ud based
and kicks in at scale.
We observe mxm outperforms ob1 on 8+ nodes.

We will update docs as you mentioned, thanks

Regards

On Thu, May 10, 2012 at 4:30 PM, Derek Gerstmann <derek.gerstmann_at_[hidden]
> wrote:

> On May 9, 2012, at 7:41 PM, Mike Dubman wrote:
>
> > you need latest OMPI 1.6.x and latest MXM (
> ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar)
>
> Excellent! Thanks for the quick response! Using the MXM v1.1.1067
> against OMPI v1.6.x did the trick. Please (!!!) add a note to the docs for
> OMPI 1.6.x to help out other users -- there's zero mention of this anywhere
> that I could find from scouring the archives and source code.
>
> Sadly, performance isn't what we'd expect. OB1 is outperforming CM MXM
> (consistently).
>
> Are there any suggested configuration settings? We tried all the obvious
> ones listed in the OMPI Wiki and mailing list archives, but few have had
> much of an effect.
>
> We seem to do better with the OB1 openib btl, than the lower level CM MXM.
> Any suggestions?
>
> Here's numbers from the OSU MicroBenchmarks (for the MBW_MR test) running
> on 2x pairs, aka 4 separate hosts, each using Mellanox ConnectX, one card
> per host, single port, single switch):
>
> -- OB1
> > /opt/openmpi/1.6.0/bin/mpiexec -np 4 --mca pml ob1 --mca btl ^tcp --mca
> mpi_use_pinned 1 -hostfile all_hosts ./osu-micro-benchmarks/osu_mbw_mr
> # OSU MPI Multiple Bandwidth / Message Rate Test v3.6
> # [ pairs: 2 ] [ window size: 64 ]
> # Size MB/s Messages/s
> 1 2.91 2909711.73
> 2 5.97 2984274.11
> 4 11.70 2924292.78
> 8 23.00 2874502.93
> 16 44.75 2796639.64
> 32 89.49 2796639.64
> 64 175.98 2749658.96
> 128 292.41 2284459.86
> 256 527.84 2061874.61
> 512 961.65 1878221.77
> 1024 1669.06 1629943.87
> 2048 2220.43 1084193.45
> 4096 2906.57 709611.68
> 8192 3017.65 368365.70
> 16384 5225.97 318967.95
> 32768 5418.98 165374.23
> 65536 5998.07 91523.27
> 131072 6031.69 46018.16
> 262144 6063.38 23129.97
> 524288 5971.77 11390.24
> 1048576 5788.75 5520.59
> 2097152 5791.39 2761.55
> 4194304 5820.60 1387.74
>
> -- MXM
> > /opt/openmpi/1.6.0/bin/mpiexec -np 4 --mca pml cm --mca mtl mxm --mca
> btl ^tcp --mca mpi_use_pinned 1 -hostfile all_hosts
> ./osu-micro-benchmarks/osu_mbw_mr
> # OSU MPI Multiple Bandwidth / Message Rate Test v3.6
> # [ pairs: 2 ] [ window size: 64 ]
> # Size MB/s Messages/s
> 1 2.07 2074863.43
> 2 4.14 2067830.81
> 4 10.57 2642471.39
> 8 23.16 2895275.37
> 16 38.73 2420627.22
> 32 66.77 2086718.41
> 64 147.87 2310414.05
> 128 284.94 2226109.85
> 256 537.27 2098709.64
> 512 1041.91 2034989.43
> 1024 1930.93 1885676.34
> 2048 1998.68 975916.00
> 4096 2880.72 703299.77
> 8192 3608.45 440484.17
> 16384 4027.15 245797.51
> 32768 4464.85 136256.47
> 65536 4594.22 70102.23
> 131072 4655.62 35519.55
> 262144 4671.56 17820.58
> 524288 4604.16 8781.74
> 1048576 4635.51 4420.77
> 2097152 3575.17 1704.78
> 4194304 2828.19 674.29
>
> Thanks!
>
> -[dg]
>
> Derek Gerstmann, PhD Student
> The University of Western Australia (UWA)
>
> w: http://local.ivec.uwa.edu.au/~derek
> e: derek.gerstmann [at] icrar.org
>
> On May 9, 2012, at 7:41 PM, Mike Dubman wrote:
>
> > you need latest OMPI 1.6.x and latest MXM (
> ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar)
> >
> >
> >
> > On Wed, May 9, 2012 at 6:02 AM, Derek Gerstmann <
> derek.gerstmann_at_[hidden]> wrote:
> > What versions of OpenMPI and the Mellanox MXM libraries have been tested
> and verified to work?
> >
> > We are currently trying to build OpenMPI v1.5.5 against the MXM 1.0.601
> (included in the MLNX_OFED_LINUX-1.5.3-3.0.0 distribution) and are getting
> build errors.
> >
> > Specifically, there's a single undefined type (mxm_wait_t) being used in
> the OpenMPI tree:
> >
> > openmpi-1.5.5/ompi/mca/mtl/mxm/mtl_mxm_send.c:44 mxm_wait_t
> wait;
> >
> > There is no mxm_wait_t defined anywhere in the current MXM API
> (/opt/mellanox/mxm/include/mxm/api), which suggests a version mismatch.
> >
> > The OpenMPI v1.6 branch has a note in the readme saying "Minor Fixes for
> Mellanox MXM" were added, but the same undefined mxm_wait_t is still being
> used.
> >
> > What versions of OpenMPI and MXM are verified to work?
> >
> > Thanks!
> >
> > -[dg]
> >
> > Derek Gerstmann, PhD Student
> > The University of Western Australia (UWA)
> >
> > w: http://local.ivec.uwa.edu.au/~derek
> > e: derek.gerstmann [at] icrar.org
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> -[dg]
>
> Derek Gerstmann, PhD Student
> The University of Western Australia (UWA)
>
> w: http://local.ivec.uwa.edu.au/~derek
> e: derek.gerstmann [at] icrar.org
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>