Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about MPI_Barrier
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2010-10-21 13:06:41


My main point was that, while what Jeff said about the short-comings of calling timers after Barriers was true, I wanted to come in defense of this timing strategy.  Otherwise, I was just agreeing with him that it seems implausible that commenting out B should influence the timing of A, but I'm equally clueless what that real issue is.  I have seen cases where the presence or absence of code that isn't executed can influence timings (perhaps because code will come out of the instruction cache differently), but all that is speculation.  It's all a guess that what you're really seeing isn't really MPI related at all.

Storm Zhang wrote:
Hi, Eugene, You said:
" The bottom line here is that from a causal point of view it would seem that B should not impact the timings.  Presumably, some other variable is actually responsible here."  Could you explain it in more details for the second sentence. Thanks a lot.

On Thu, Oct 21, 2010 at 9:58 AM, Eugene Loh <eugene.loh@oracle.com> wrote:
Jeff Squyres wrote:

MPI::COMM_WORLD.Barrier();
if(rank == master) t1 = clock();
"code A";
MPI::COMM_WORLD.Barrier();
if(rank == master) t2 = clock();
"code B";

Remember that the time that individual processes exit barrier is not guaranteed to be uniform (indeed, it most likely *won't* be the same).  MPI only guarantees that a process will not exit until after all processes have entered.  So taking t2 after the barrier might be a bit misleading, and may cause unexpected skew.
 
The barrier exit times are not guaranteed to be uniform, but in practice this style of timing is often the best (or only practical) tool one has for measuring the collective performance of a group of processes.

Code B *probably* has no effect on time spent between t1 and t2.  But extraneous effects might cause it to do so -- e.g., are you running in an oversubscribed scenario?  And so on.
 
Right.  The bottom line here is that from a causal point of view it would seem that B should not impact the timings.  Presumably, some other variable is actually responsible here.