Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-03-01 06:25:50


On Mar 1, 2012, at 1:17 AM, Jingcha Joba wrote:

> Aah...
> So when openMPI is compile with OFED, and run on a Infiniband/RoCE devices, I would use the mpi would simply direct to ofed to do point to point calls in the ofed way?

I'm not quite sure how to parse that. :-)

The openib BTL uses verbs functions to effect data transfers between MPI process peers. The BTL is one of the lower layers in Open MPI for point-to-point communication; BTL plugins are used to effect the device-specific transport stuff for MPI_SEND, MPI_RECV, MPI_PUT, ...etc. Hence, when you run with the openib BTL and call MPI_SEND (assumedly to a peer that is reachable via an OpenFabrics device), the openib BTL will eventually be called to actually send the message. The openib BTL will send the message to the peer via calls to some combination of calls to verbs functions.

Mellanox has also introduced a library called "MXM" that can also be used for underlying MPI message transport (as opposed to using the openib BTL). See the Open MPI README for some explanations about the different transports that Open MPI can use (specifically: "ob1" vs. "cm").

> > More specifically: all things being equal, you don't care which is used. You just want your message to get to the receiver/target as fast as possible. One of the main ideas of MPI is to hide those kinds of details from the user. I.e., you call MPI_SEND. A miracle occurs. The message is received on the other side.
>
> True. Its just that I am digging into the OFED source code and the ompi source code,and trying to understand the way these two interact..

The openib BTL is probably one of the most complex sections of Open MPI, unfortunately. :-\ The verbs API is *quite* complex, and has many different options that do not work on all types of OpenFabrics hardware. This leads to many different blocks of code, not all of which are executed on all platforms. The verbs model of registering memory also leads to a lot of complications, especially since, for performance reasons, MPI has to cache memory registrations and interpose itself in the memory subsystem to catch when registered memory is freed (see the README for some details here).

If you have any specific questions about the implementation, post over on the devel list.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/