Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Calling MPI_Test() too many times results in a time spike
From: Ioannis Papadopoulos (giannis.papadopoulos_at_[hidden])
Date: 2010-11-30 16:09:02


Eugene Loh wrote:
> Ioannis Papadopoulos wrote:
>
>> Has anyone observed similar behaviour? Is it something that I'll have
>> to deal with it in my code or does it indeed qualify as an issue to
>> be looked into?
>
> I would say this is NOT an issue that merits much attention. There
> are too many potential performance anomalies that you might be
> encountering and they aren't worth "fixing" (or even understanding)
> unless they impact your application's performance in a meaningful way.
>
> E.g., try timing "nothing". Here is a sample test program:
>
> #include <stdio.h>
> #include <mpi.h>
>
> #define N 1000000
>
> int main(int argc, char **argv) {
> int i;
> double t[N], tavg = 0, tmin = 1.e20, tmax = 0;
>
> MPI_Init(&argc,&argv);
> for ( i = 0; i < N; i++ ) {
> t[i] = MPI_Wtime();
> t[i] = MPI_Wtime() - t[i];
> }
> for ( i = 0; i < N; i++ ) {
> tavg += t[i];
> if ( tmin > t[i] ) tmin = t[i];
> if ( tmax < t[i] ) tmax = t[i];
> }
> tavg /= N;
>
> printf("avg %12.3lf\n", tavg * 1.e6);
> printf("min %12.3lf\n", tmin * 1.e6);
> printf("max %12.3lf\n", tmax * 1.e6);
>
> MPI_Finalize();
> return 0;
> }
>
> I find that the minimum is 0 (indicating non-infinitesimal granularity
> of the timer), the average is small (some overhead of the timer call),
> and the maximum is very large. Why? Because something will happen
> now and then. What it is doesn't matter unless your application's
> performance is suffering.
>
> You report that the overall time is about the same. That is, it takes
> just over a second to receive the message, which is expected if the
> sender delays a second before sending.

The overall time may be the same, however it is alarming (at least to
me) that if you call MPI_Test() too many times, the average time per
MPI_Test() call increases. After all, that is what I am trying to
measure, how much it costs to call MPI_Test() on average.

In your MPI_Wtime() example, the average overhead of MPI_Wtime() is
exactly the same, independently of max/min time - which is what I would
expect. This is not true for MPI_Test(). A small delay before calling
the later, lowers the MPI_Test() average time.

> One of the things you could do is look at total time to receive the
> message and total time spent in MPI_Test. Then, vary TIMEOUT more
> smoothly (0.000001, 0.000002, 0.000005, 0.00001, 0.00002, 0.00005,
> 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02). You may
> also have to run many times to see how reproducible the results are.
> As TIMEOUT increases, the total time to get the message will roughly
> increase, but not by much until TIMEOUT gets pretty large. The total
> time spent in MPI_Test should fall as TIMEOUT increases. So, the idea
> is that by increasing TIMEOUT, you decrease the responsiveness of the
> receiver while you make more CPU time available for other tasks. With
> any luck, there will be a broad range of TIMEOUT values that degrade
> responsiveness negligibly while freeing a meaningful amount of time up
> for other computational tasks.
>
> The performance of MPI_Test() -- and of a particular MPI_Test() call
> -- is probably not very meaningful.
I have run my toy example a lot of times on various systems with both
OpenMPI and IBM MPI - I am always seeing the same behavior I have reported.

The performance of MPI_Test() is critical when you are not using MPI to
implement applications directly, but rather as transport layer for
higher level runtime systems. In that case, you need the non-blocking
MPI_Test() to avoid being stuck in MPI and do other things that your
runtime has to do.

Having an inconsistent overhead on MPI_Test() forces me to have a timer
before it, to avoid seeing this performance degradation in cases where
my runtime is out of things to do (but still cannot afford to block into
MPI) so it hits MPI_Test() until something arrives from the MPI layer or
the user gives something to the runtime something to do. A timer that
should not exist in my opinion.

>
> Note that your MPI_Irecv calls should strictly speaking have
> MPI_ANY_SOURCE rather than MPI_ANY_TAG.

You're right. However, even with having exactly matching
MPI_Isend/MPI_Irecv, the problem still exists.