Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] trouble using --mca mpi_yield_when_idle 1
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-12-12 16:47:11

On Dec 12, 2008, at 3:22 PM, douglas.guptill_at_[hidden] wrote:

>> I could imagine another alternative.  Construct a wrapper
>> function that
>> intercepts MPI_Recv and turn it into something like<br>
>> <br>
>> PMPI_Irecv<br>
>> while ( ! done ) {<br>
>> &nbsp;&nbsp;&nbsp; nanosleep(short);<br>
>> &nbsp;&nbsp;&nbsp; PMPI_Test(&amp;done);<br>
>> }<br>
>> <br>
>> I don't know how useful this would be for your particular case.<br>
>> <br>
> Thank you for the suggestion. I didn't know about "PMPI_Irecv" (my
> question was what/where did the "P" prefix to MPI come from?) so I
> went back to the MPI standard, and re-read the description of
> "mpi_send" and "mpi_recv".

The "P" is MPI's profiling interface. See chapter 14 in the MPI-2.1

> Based on my re-read of the MPI standard, it appears that I may have
> slightly mis-stated my issue. The spin is probably taking place in
> "mpi_send". "mpi_send", according to my understanding of the MPI
> standard, may not exit until a matching "mpi_recv" has been initiated,
> or completed. At least that is the conclusion I came to.

Perhaps something like this:

int MPI_Send(...) {
    MPI_Request req;
    int flag;
    PMPI_Isend(..., &req);
    do {
       PMPI_Test(&req, &flag, MPI_STATUS_IGNORE);
    } while (!flag);

That is, *you* provide MPI_Send and intercept all your apps calls to
MPI_Send. But you implement it by doing a non-blocking send and
sleeping and polling MPI to know when it's done. Of course, you don't
have to implement this as MPI_Send -- you could always have
your_func_prefix_send(...) instead of explicitly using the MPI
profiling interface. But using the profiling interface allows you to
swap in/out different implementations of MPI_Send (etc.) at link time,
if that's desirable to you.

Looping over sleep/test is not the most efficient way of doing it, but
it may be suitable for your purposes.

> However my complaint - sorry, I wish I could think of a better word -
> remains.

No worries! :-)

> It appears that openmpi spin-waits, as opposed to, say,
> going to sleep and waiting for a wake-up call. Like a semaphore - if
> those things still exist.

Correct. Most MPI's do at least some form of spin waiting (some do
have the ability to block after a while). As mentioned on this
thread, we have it on our roadmap, but the timing of when it happens
is -- as yet -- unknown. We are at driven by customer/user input,
though, so if lots of people ask for this, there's more of a chance
for it getting done than if no one is asking for it. :-)

Jeff Squyres
Cisco Systems