Order in which processes hit the barrier is only one factor in the time it takes for that process to finish the barrier.
An easy way to think of a barrier implementation is a "fan in/fan out" model. When each nonzero rank process calls MPI_BARRIER, it sends a message saying "I have hit the barrier!" (it usually sends it to its parent in a tree of all MPI processes in the communicator, but you can simplify this model and consider that it sends it to rank 0). Rank 0 collects all of these messages. When it has messages from all processes in the communicator, it sends out "ok, you can leave the barrier now" messages (again, it's usually via a tree distribution, but you can pretend that it directly, linearly sends a message to each peer process in the communicator).
Hence, the time that any individual process spends in the communicator is relative to when every other process enters the communicator. But it's also dependent upon communication speed, congestion in the network, etc.
On Sep 8, 2011, at 6:20 AM, Ghislain Lartigue wrote:
> at a given point in my (Fortran90) program, I write:
> start_time = MPI_Wtime()
> call MPI_BARRIER(...)
> new_time = MPI_Wtime() - start_time
> write(*,*) "barrier time =",new_time
> and then I run my code...
> I expected that the values of "new_time" would range from 0 to Tmax (1700 in my case)
> As I understand it, the first process that hits the barrier should print Tmax and the last process that hits the barrier should print 0 (or a very low value).
> But this is not the case: all processes print values in the range 1400-1700!
> Any explanation?
> This small code behaves perfectly in other parts of my code...
> users mailing list
For corporate legal information go to: