Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to reduce Isend & Irecv bandwidth?
From: Thomas Watson (exascale.system_at_[hidden])
Date: 2013-05-01 22:44:38


Hi Aurelien,

Excellent! Point 2) is exactly what I need - no data is actually sent and
Irecv completes normally.

Thanks!

Jacky

On Wed, May 1, 2013 at 6:29 PM, Aurélien Bouteiller <bouteill_at_[hidden]>wrote:

> Hi Jacky,
>
> 1. If you do not post a matching send, the wait(all) on the recv will
> stall forever.
> 2. You can match a recv(count, tag, src) with a send(0, tag, dst). The
> recv will complete, the status can be inspected to verify how many bytes
> have actually been received. It is illegal to send more than what count can
> hold at the receiver, but it is perfectly fine to send less.
>
> Hope it helps,
> Aurelien
>
>
> Le 1 mai 2013 à 18:05, Gus Correa <gus_at_[hidden]> a écrit :
>
> > Hi Thomas/Jacky
> >
> > Maybe using MPI_Probe (and maybe also MPI_Cancel)
> > to probe the message size,
> > and receive only those with size>0?
> > Anyway, I'm just code-guessing.
> >
> > I hope it helps,
> > Gus Correa
> >
> > On 05/01/2013 05:14 PM, Thomas Watson wrote:
> >> Hi Gus,
> >>
> >> Thanks for your suggestion!
> >>
> >> The problem of this two-phased data exchange is as follows. Each rank
> >> can have data blocks that will be exchanged to potentially all other
> >> ranks. So if a rank needs to tell all the other ranks about which blocks
> >> to receive, it would require an all-to-all collective communication
> >> during phase one (e.g., MPI_Gatherallv). Because such collective
> >> communication is blocking in current stable OpenMPI (MPI-2), it would
> >> have a negative impact on scalability of the application, especially
> >> when we have a large number of MPI ranks. This negative impact would not
> >> be compensated by the bandwidth saved :-)
> >>
> >> What I really need is something like this: Isend sets count to 0 if a
> >> block is not dirty. On the receiving side, MPI_Waitall deallocates the
> >> corresponding Irecv request immediately and sets the Irecv request
> >> handle to MPI_REQUEST_NULL as if it were a normal Irecv. I am wondering
> >> if someone could confirm this behavior with me? I could do an experiment
> >> on this too...
> >>
> >> Regards,
> >>
> >> Jacky
> >>
> >>
> >>
> >>
> >> On Wed, May 1, 2013 at 3:46 PM, Gus Correa <gus_at_[hidden]
> >> <mailto:gus_at_[hidden]>> wrote:
> >>
> >> Maybe start the data exchange by sending a (presumably short)
> >> list/array/index-function of the dirty/not-dirty blocks status
> >> (say, 0=not-dirty,1=dirty),
> >> then putting if conditionals before the Isend/Irecv so that only
> >> dirty blocks are exchanged?
> >>
> >> I hope this helps,
> >> Gus Correa
> >>
> >>
> >>
> >>
> >> On 05/01/2013 01:28 PM, Thomas Watson wrote:
> >>
> >> Hi,
> >>
> >> I have a program where each MPI rank hosts a set of data blocks.
> >> After
> >> doing computation over *some of* its local data blocks, each MPI
> >> rank
> >> needs to exchange data with other ranks. Note that the
> >> computation may
> >> involve only a subset of the data blocks on a MPI rank. The data
> >> exchange is achieved at each MPI rank through Isend and Irecv
> >> and then
> >> Waitall to complete the requests. Each pair of Isend and Irecv
> >> exchanges
> >> a corresponding pair of data blocks at different ranks. Right
> >> now, we do
> >> Isend/Irecv for EVERY block!
> >>
> >> The idea is that because the computation at a rank may only
> >> involves a
> >> subset of blocks, we could mark those blocks as dirty during the
> >> computation. And to reduce data exchange bandwidth, we could only
> >> exchanges those *dirty* pairs across ranks.
> >>
> >> The problem is: if a rank does not compute on a block 'm', and
> if it
> >> does not call Isend for 'm', then the receiving rank must
> >> somehow know
> >> this and either a) does not call Irecv for 'm' as well, or b)
> >> let Irecv
> >> for 'm' fail gracefully.
> >>
> >> My questions are:
> >> 1. how Irecv will behave (actually how MPI_Waitall will behave)
> >> if the
> >> corresponding Isend is missing?
> >>
> >> 2. If we still post Isend for 'm', but because we really do not
> >> need to
> >> send any data for 'm', can I just set a "flag" in Isend so that
> >> MPI_Waitall on the receiving side will "cancel" the
> >> corresponding Irecv
> >> immediately? For example, I can set the count in Isend to 0, and
> >> on the
> >> receiving side, when MPI_Waitall see a message with empty
> >> payload, it
> >> reclaims the corresponding Irecv? In my code, the correspondence
> >> between
> >> a pair of Isend and Irecv is established by a matching TAG.
> >>
> >> Thanks!
> >>
> >> Jacky
> >>
> >>
> >> _________________________________________________
> >> users mailing list
> >> users_at_[hidden] <mailto:users_at_[hidden]>
> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users
> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >>
> >>
> >> _________________________________________________
> >> users mailing list
> >> users_at_[hidden] <mailto:users_at_[hidden]>
> >> http://www.open-mpi.org/__mailman/listinfo.cgi/users
> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 309b
> * Knoxville, TN 37996
> * 865 974 9375
>
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>