Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Galen M. Shipman (gshipman_at_[hidden])
Date: 2006-06-05 12:34:17

On Jun 2, 2006, at 5:55 PM, Jonathan Day wrote:

> Hi,
> I'm working on developing some components for OpenMPI,
> but am a little unclear as to how to implement
> efficient sends and receives. I'm wanting to do
> zero-copy two-sided MPI, but as far as I can see, this
> is not going to be easy. As best as I can tell, the
> receive mechanism copies into a temporary user buffer
> then, on actually handling the receive, copies that
> into the application's buffer. Would I be correct in
> this interpretation?

This is really up to the implementer. If the BTL supports "send in
place" the PML will prepare a descriptor pointing at the users memory
and then use send/receive to transfer the message.
The receive side is a bit more tricky. The zero copy interconnects
which we support require that receive descriptors be posted to a
queue and these are consumed in-order. To receive with zero copy the
user buffer would need to be posted to the receive queue and these
would have to be "in order". The MPI level cannot post these receives
such that ordering is obeyed in all cases. To get around this some
interconnects allow you to post receives along with matching
information and the interconnect ensures MPI ordering.

In addition to the above issues on receiving directly into the user's
buffer, there is also a performance hit for most interconnects
because the memory must be registered (pinned and made resident).
These costs dominate any benefit of zero copy for small/medium
messages. Open MPI therefore uses send/receive with copy in/out for
message sizes up to a configurable limit. After this limit RDMA is
used to provide zero copy.

> I'm also a little hazy on how to get information on
> messages being passed. What information on the sending
> process is visible to the receiving BTL components?
The BTL's are designed to be MPI agnostic. They are the "Byte
Transfer Layer" and the PML "Point-to-Point Messaging Layer" hides
MPI from them..

> Finally, I'm assuming that developers have, over time,
> produced test harnesses and other useful (for
> developers) tools that would have no real value to
> general users. Has anyone put together a kit of
> development aids for coders of new components?
There have been some unit tests developed for various areas of Open
MPI. For point-to-point however this was not seen as a big benefit.
For us it was easier to begin testing with a simple MPI ping-pong and
then graduate to the Intel-Test suite or some other more
comprehensive set of point-to-point tests.

There is some information on the web that should help you in
understanding the Open MPI p2p architecture:

Take a look under Wednesday - Point to Point architecture, if you
have problems reading the slides let me know and I can send them one
slide per page.

We are also working on another point-to-point architecture for
interconnects that provide matching and other MPI facilities but we
are a few weeks off from having this available.



> Jonathan Day
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> _______________________________________________
> devel mailing list
> devel_at_[hidden]