Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [EXTERNAL] Re: Question regarding osu-benchamarks 3.1.1
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-02-29 15:15:51

On Feb 29, 2012, at 2:57 PM, Jingcha Joba wrote:

> So if I understand correctly, if a message size is smaller than it will use the MPI way (non-RDMA, 2 way communication), if its larger, then it would use the Open Fabrics, by using the ibverbs (and ofed stack) instead of using the MPI's stack?

Er... no.

So let's talk MPI-over-OpenFabrics-verbs specifically.

All MPI communication calls will use verbs under the covers. They may use verbs send/receive semantics in some cases, and RDMA semantics in other cases. "It depends" -- on a lot of things, actually. It's hard to come up with a good rule of thumb for when it uses one or the other; this is one of the reasons that the openib BTL code is so complex. :-)

The main points here are:

1. you can trust the openib BTL to do the Best thing possible to get the message to the other side. Regardless of whether that message is an MPI_SEND or an MPI_PUT (for example).

2. MPI_PUT does not necessarily == verbs RDMA write (and likewise, MPI_GET does not necessarily == verbs RDMA read).

> If so, could that be the reason why the MPI_Put "hangs" when sending a message more than 512KB (or may be 1MB)?

No. I'm guessing that there's some kind of bug in the MPI_PUT implementation.

> Also is there a way to know if for a particular MPI call, OF uses send/recv or RDMA exchange?

Not really.

More specifically: all things being equal, you don't care which is used. You just want your message to get to the receiver/target as fast as possible. One of the main ideas of MPI is to hide those kinds of details from the user. I.e., you call MPI_SEND. A miracle occurs. The message is received on the other side.


Jeff Squyres
For corporate legal information go to: