Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-10-18 07:56:40


On Wed, Oct 17, 2007 at 05:43:14PM -0400, Jeff Squyres wrote:
> Several users have noticed poor latency with Open MPI when using the
> new Mellanox ConnectX HCA hardware. Open MPI was getting about 1.9us
> latency with 0 byte ping-pong benchmarks (e.g., NetPIPE or
> osu_latency). This has been fixed in OMPI v1.2.4.
>
> Short version:
> --------------
>
> Open MPI v1.2.4 (and newer) will get around 1.5us latency with 0 byte
> ping-pong benchmarks on Mellanox ConnectX HCAs. Prior versions of
> Open MPI can also achieve this low latency by setting the
> btl_openib_use_eager_rdma MCA parameter to 1.

Actually setting btl_openib_use_eager_rdma to 1 will not help. The
reason is that it is 1 by default anyway, but Open MPI disables eager
rdma because it can't find HCA description in the ini file and cannot
distinguish between default value and value that user set explicitly.

>
> Longer version:
> ---------------
>
> Until OMPI v1.2.4, Open MPI did not include specific configuration
> information for ConnectX hardware, which forced Open MPI to choose
> the conservative/safe configuration of not using RDMA for short
> messages (using send/receive semantics instead). This increases
> point-to-point latency in benchmarks.
>
> OMPI v1.2.4 (and newer) includes the relevant configuration
> information that enables short message RDMA by default on Mellanox
> ConnectX hardware. This significantly improves Open MPI's latency on
> popular MPI benchmark applications.
>
> The same performance can be achieved on prior versions of Open MPI by
> setting the btl_openib_use_eager_rdma MCA parameter to 1. The main
> difference between v1.2.4 and prior versions is that the prior
> versions do not set this MCA parameter value by default for ConnectX
> hardware (because ConnectX did not exist when prior versions of Open
> MPI were released).
>
> This information is also now described on the FAQ:
>
> http://www.open-mpi.org/faq/?category=openfabrics#mellanox-connectx-
> poor-latency
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
			Gleb.