Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Using MPI_Put/Get correctly?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-12-16 09:34:31


Open MPI uses RDMA under the covers for send/receive when it makes sense. See these FAQ entries for more details:

    http://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.2
    http://www.open-mpi.org/faq/?category=openfabrics#large-message-tuning-1.3
    http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned

The short version is that in many cases, we're doing the RDMA for you anyway and you can keep your send/receive semantics. In general, RDMA helps in 2 cases:

1. sending lots of short messages to a small number of peer processes. With RDMA, you can get lower latency of delivery of these messages.

2. sending large messages. With RDMA, you can get higher bandwidth and interrupt the receiver's CPU less often (i.e., more overlap of communication / computation).

If you don't have either of these cases, RDMA might not be that important to your application.

One crass generalization (which isn't entirely accurate, but it's probably close enough for this discussion) is that you should use whichever of 1 or 2 sided semantics are easier for your application's native abstractions. Software's complicated enough; if you don't need a (possible) 2% performance improvement (I made that number up as an example), then don't add a pile of incredibly complex code that will be a nightmare to maintain over time.

Additionally, since MPI-3 is updating the semantics of the one-sided stuff, it might be worth waiting for all those clarifications before venturing into the MPI one-sided realm. One-sided semantics are much more subtle and complex than two-sided semantics.

That's my $0.02. :-)

On Dec 16, 2010, at 9:15 AM, Matthew J. Grismer wrote:

> I found a presentation on the web that showed significant performance
> benefits for the one-sided communication, I presumed it was from hardware
> RDMA support that the one-sided calls could take advantage of. But I gather
> from the your question that is not necessarily the case. Are you aware of
> cases in which it has made a significant difference?
>
>
> On 12/15/10 9:18 PM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>> Is there a reason to convert your code from send/receive to put/get?
>>
>> The performance may not be that significantly different, and as you have
>> noted, the MPI-2 put/get semantics are a total nightmare to understand (I
>> personally advise people not to use them -- MPI-3 is cleaning up the put/get
>> semantics a LOT).
>>
>>
>> On Dec 15, 2010, at 3:15 PM, Grismer, Matthew J Civ USAF AFMC AFRL/RBAT wrote:
>>
>>> I am trying to modify the communication routines in our code to use
>>> MPI_Put's instead of sends and receives. This worked fine for several
>>> variable Put's, but now I have one that is causing seg faults. Reading
>>> through the MPI documentation it is not clear to me if what I am doing
>>> is permissible or not. Basically, the question is this - if I have
>>> defined all of an array as a window on each processor, can I PUT data
>>> from that array to remote processes at the same time as the remote
>>> processes are PUTing into the local copy, assuming no overlaps of any of
>>> the PUTs?
>>>
>>> Here are the details if that doesn't make sense. I have a (Fortran)
>>> array QF(6,2,N) on each processor, where N could be a very large number
>>> (100,000). I create a window QFWIN on the entire array on all the
>>> processors. I define MPI_Type_indexed "sending" datatypes (QFSND) with
>>> block lengths of 6 that send from QF(1,1,*), and MPI_Type_indexed
>>> "receiving" datatypes (QFREC) with block lengths of 6 the receive into
>>> QF(1,2,*). Here * is non-repeating set of integers up to N. I create
>>> groups of processors that communicate, where these groups will all
>>> exchange QF data, PUTing local QF(1,1,*) to remote QF(1,2,*). So,
>>> processor 1 is PUTing QF data to processors 2,3,4 at the same time 2,3,4
>>> are putting their QF data to 1, and so on. Processors 2,3,4 are PUTing
>>> into non-overlapping regions of QF(1,2,*) on 1, and 1 is PUTing from
>>> QF(1,1,*) to 2,3,4, and so on. So, my calls look like this on each
>>> processor:
>>>
>>> assertion = 0
>>> call MPI_Win_post(group, assertion, QFWIN, ierr)
>>> call MPI_Win_start(group, assertion, QFWIN, ierr)
>>>
>>> do I=1,neighbors
>>> call MPI_Put(QF, 1, QFSND(I), NEIGHBOR(I), 0, 1, QFREC(I), QFWIN,
>>> ierr)
>>> end do
>>>
>>> call MPI_Win_complete(QFWIN,ierr)
>>> call MPI_Win_wait(QFWIN,ierr)
>>>
>>> Note I did define QFREC locally on each processor to properly represent
>>> where the data was going on the remote processors. The error value
>>> ierr=0 after MPI_Win_post, MPI_Win_start, MPI_Put, and MPI_Win_complete,
>>> and the code seg faults in MPI_Win_wait.
>>>
>>> I'm using Open MPI 1.4.3 on Mac OS X 10.6.5, built with Intel XE (12.0)
>>> compilers, and running on just 2 (internal) processors of my Mac Pro.
>>> The code ran normally with this configuration up until the point I put
>>> the above in. Several other communications with MPI_Put similar to the
>>> above work fine, though the windows are only on a subset of the
>>> communicated array, and the origin data is being PUT from part of the
>>> array that is not within the window.
>>> _____________________________________________________
>>> Matt
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/