On Dec 12, 2008, at 3:22 PM, douglas.guptill_at_[hidden] wrote:
>> I could imagine another alternative. Construct a wrapper
>> function that
>> intercepts MPI_Recv and turn it into something like<br>
>> <br>
>> PMPI_Irecv<br>
>> while ( ! done ) {<br>
>> nanosleep(short);<br>
>> PMPI_Test(&done);<br>
>> }<br>
>> <br>
>> I don't know how useful this would be for your particular case.<br>
>> <br>
>
> Thank you for the suggestion. I didn't know about "PMPI_Irecv" (my
> question was what/where did the "P" prefix to MPI come from?) so I
> went back to the MPI standard, and re-read the description of
> "mpi_send" and "mpi_recv".
The "P" is MPI's profiling interface. See chapter 14 in the MPI-2.1
doc.
> Based on my re-read of the MPI standard, it appears that I may have
> slightly mis-stated my issue. The spin is probably taking place in
> "mpi_send". "mpi_send", according to my understanding of the MPI
> standard, may not exit until a matching "mpi_recv" has been initiated,
> or completed. At least that is the conclusion I came to.
Perhaps something like this:
int MPI_Send(...) {
MPI_Request req;
int flag;
PMPI_Isend(..., &req);
do {
nanosleep(short);
PMPI_Test(&req, &flag, MPI_STATUS_IGNORE);
} while (!flag);
}
That is, *you* provide MPI_Send and intercept all your apps calls to
MPI_Send. But you implement it by doing a non-blocking send and
sleeping and polling MPI to know when it's done. Of course, you don't
have to implement this as MPI_Send -- you could always have
your_func_prefix_send(...) instead of explicitly using the MPI
profiling interface. But using the profiling interface allows you to
swap in/out different implementations of MPI_Send (etc.) at link time,
if that's desirable to you.
Looping over sleep/test is not the most efficient way of doing it, but
it may be suitable for your purposes.
> However my complaint - sorry, I wish I could think of a better word -
> remains.
No worries! :-)
> It appears that openmpi spin-waits, as opposed to, say,
> going to sleep and waiting for a wake-up call. Like a semaphore - if
> those things still exist.
Correct. Most MPI's do at least some form of spin waiting (some do
have the ability to block after a while). As mentioned on this
thread, we have it on our roadmap, but the timing of when it happens
is -- as yet -- unknown. We are at driven by customer/user input,
though, so if lots of people ask for this, there's more of a chance
for it getting done than if no one is asking for it. :-)
--
Jeff Squyres
Cisco Systems
|