Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to reduce Isend & Irecv bandwidth?
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2013-05-01 18:29:45


Hi Jacky,

1. If you do not post a matching send, the wait(all) on the recv will stall forever.
2. You can match a recv(count, tag, src) with a send(0, tag, dst). The recv will complete, the status can be inspected to verify how many bytes have actually been received. It is illegal to send more than what count can hold at the receiver, but it is perfectly fine to send less.

Hope it helps,
Aurelien

Le 1 mai 2013 à 18:05, Gus Correa <gus_at_[hidden]> a écrit :

> Hi Thomas/Jacky
>
> Maybe using MPI_Probe (and maybe also MPI_Cancel)
> to probe the message size,
> and receive only those with size>0?
> Anyway, I'm just code-guessing.
>
> I hope it helps,
> Gus Correa
>
> On 05/01/2013 05:14 PM, Thomas Watson wrote:
>> Hi Gus,
>>
>> Thanks for your suggestion!
>>
>> The problem of this two-phased data exchange is as follows. Each rank
>> can have data blocks that will be exchanged to potentially all other
>> ranks. So if a rank needs to tell all the other ranks about which blocks
>> to receive, it would require an all-to-all collective communication
>> during phase one (e.g., MPI_Gatherallv). Because such collective
>> communication is blocking in current stable OpenMPI (MPI-2), it would
>> have a negative impact on scalability of the application, especially
>> when we have a large number of MPI ranks. This negative impact would not
>> be compensated by the bandwidth saved :-)
>>
>> What I really need is something like this: Isend sets count to 0 if a
>> block is not dirty. On the receiving side, MPI_Waitall deallocates the
>> corresponding Irecv request immediately and sets the Irecv request
>> handle to MPI_REQUEST_NULL as if it were a normal Irecv. I am wondering
>> if someone could confirm this behavior with me? I could do an experiment
>> on this too...
>>
>> Regards,
>>
>> Jacky
>>
>>
>>
>>
>> On Wed, May 1, 2013 at 3:46 PM, Gus Correa <gus_at_[hidden]
>> <mailto:gus_at_[hidden]>> wrote:
>>
>> Maybe start the data exchange by sending a (presumably short)
>> list/array/index-function of the dirty/not-dirty blocks status
>> (say, 0=not-dirty,1=dirty),
>> then putting if conditionals before the Isend/Irecv so that only
>> dirty blocks are exchanged?
>>
>> I hope this helps,
>> Gus Correa
>>
>>
>>
>>
>> On 05/01/2013 01:28 PM, Thomas Watson wrote:
>>
>> Hi,
>>
>> I have a program where each MPI rank hosts a set of data blocks.
>> After
>> doing computation over *some of* its local data blocks, each MPI
>> rank
>> needs to exchange data with other ranks. Note that the
>> computation may
>> involve only a subset of the data blocks on a MPI rank. The data
>> exchange is achieved at each MPI rank through Isend and Irecv
>> and then
>> Waitall to complete the requests. Each pair of Isend and Irecv
>> exchanges
>> a corresponding pair of data blocks at different ranks. Right
>> now, we do
>> Isend/Irecv for EVERY block!
>>
>> The idea is that because the computation at a rank may only
>> involves a
>> subset of blocks, we could mark those blocks as dirty during the
>> computation. And to reduce data exchange bandwidth, we could only
>> exchanges those *dirty* pairs across ranks.
>>
>> The problem is: if a rank does not compute on a block 'm', and if it
>> does not call Isend for 'm', then the receiving rank must
>> somehow know
>> this and either a) does not call Irecv for 'm' as well, or b)
>> let Irecv
>> for 'm' fail gracefully.
>>
>> My questions are:
>> 1. how Irecv will behave (actually how MPI_Waitall will behave)
>> if the
>> corresponding Isend is missing?
>>
>> 2. If we still post Isend for 'm', but because we really do not
>> need to
>> send any data for 'm', can I just set a "flag" in Isend so that
>> MPI_Waitall on the receiving side will "cancel" the
>> corresponding Irecv
>> immediately? For example, I can set the count in Isend to 0, and
>> on the
>> receiving side, when MPI_Waitall see a message with empty
>> payload, it
>> reclaims the corresponding Irecv? In my code, the correspondence
>> between
>> a pair of Isend and Irecv is established by a matching TAG.
>>
>> Thanks!
>>
>> Jacky
>>
>>
>> _________________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/__mailman/listinfo.cgi/users
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>> _________________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/__mailman/listinfo.cgi/users
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375