Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_BARRIER
From: Ghislain Lartigue (ghislain.lartigue_at_[hidden])
Date: 2011-09-08 10:25:14


Thanks,

I understand this but the delays that I measure are huge compared to a classical ack procedure... (1000x more)
And this is repeatable: as far as I understand it, this shows that the network is not involved.

Ghislain.

Le 8 sept. 2011 à 16:16, Teng Ma a écrit :

> I guess you forget to count the "leaving time"(fan-out). When everyone
> hits the barrier, it still needs "ack" to leave. And remember in most
> cases, leader process will send out "acks" in a sequence way. It's very
> possible:
>
> P0 barrier time = 29 + send/recv ack 0
> P1 barrier time = 14 + send ack 0 + send/recv ack 1
> P2 barrier time = 0 + send ack 0 + send ack 1 + send/recv ack 2
>
> That's your measure time.
>
> Teng
>> This problem as nothing to do with stdout...
>>
>> Example with 3 processes:
>>
>> P0 hits barrier at t=12
>> P1 hits barrier at t=27
>> P2 hits barrier at t=41
>>
>> In this situation:
>> P0 waits 41-12 = 29
>> P1 waits 41-27 = 14
>> P2 waits 41-41 = 00
>
>
>
>> So I should see something like (no ordering is expected):
>> barrier_time = 14
>> barrier_time = 00
>> barrier_time = 29
>>
>> But what I see is much more like
>> barrier_time = 22
>> barrier_time = 29
>> barrier_time = 25
>>
>> See? No process has a barrier_time equal to zero !!!
>>
>>
>>
>> Le 8 sept. 2011 à 14:55, Jeff Squyres a écrit :
>>
>>> The order in which you see stdout printed from mpirun is not necessarily
>>> reflective of what order things were actually printers. Remember that
>>> the stdout from each MPI process needs to flow through at least 3
>>> processes and potentially across the network before it is actually
>>> displayed on mpirun's stdout.
>>>
>>> MPI process -> local Open MPI daemon -> mpirun -> printed to mpirun's
>>> stdout
>>>
>>> Hence, the ordering of stdout can get transposed.
>>>
>>>
>>> On Sep 8, 2011, at 8:49 AM, Ghislain Lartigue wrote:
>>>
>>>> Thank you for this explanation but indeed this confirms that the LAST
>>>> process that hits the barrier should go through nearly instantaneously
>>>> (except for the broadcast time for the acknowledgment signal).
>>>> And this is not what happens in my code : EVERY process waits for a
>>>> very long time before going through the barrier (thousands of times
>>>> more than a broadcast)...
>>>>
>>>>
>>>> Le 8 sept. 2011 à 14:26, Jeff Squyres a écrit :
>>>>
>>>>> Order in which processes hit the barrier is only one factor in the
>>>>> time it takes for that process to finish the barrier.
>>>>>
>>>>> An easy way to think of a barrier implementation is a "fan in/fan out"
>>>>> model. When each nonzero rank process calls MPI_BARRIER, it sends a
>>>>> message saying "I have hit the barrier!" (it usually sends it to its
>>>>> parent in a tree of all MPI processes in the communicator, but you can
>>>>> simplify this model and consider that it sends it to rank 0). Rank 0
>>>>> collects all of these messages. When it has messages from all
>>>>> processes in the communicator, it sends out "ok, you can leave the
>>>>> barrier now" messages (again, it's usually via a tree distribution,
>>>>> but you can pretend that it directly, linearly sends a message to each
>>>>> peer process in the communicator).
>>>>>
>>>>> Hence, the time that any individual process spends in the communicator
>>>>> is relative to when every other process enters the communicator. But
>>>>> it's also dependent upon communication speed, congestion in the
>>>>> network, etc.
>>>>>
>>>>>
>>>>> On Sep 8, 2011, at 6:20 AM, Ghislain Lartigue wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> at a given point in my (Fortran90) program, I write:
>>>>>>
>>>>>> ===================
>>>>>> start_time = MPI_Wtime()
>>>>>> call MPI_BARRIER(...)
>>>>>> new_time = MPI_Wtime() - start_time
>>>>>> write(*,*) "barrier time =",new_time
>>>>>> ==================
>>>>>>
>>>>>> and then I run my code...
>>>>>>
>>>>>> I expected that the values of "new_time" would range from 0 to Tmax
>>>>>> (1700 in my case)
>>>>>> As I understand it, the first process that hits the barrier should
>>>>>> print Tmax and the last process that hits the barrier should print 0
>>>>>> (or a very low value).
>>>>>>
>>>>>> But this is not the case: all processes print values in the range
>>>>>> 1400-1700!
>>>>>>
>>>>>> Any explanation?
>>>>>>
>>>>>> Thanks,
>>>>>> Ghislain.
>>>>>>
>>>>>> PS:
>>>>>> This small code behaves perfectly in other parts of my code...
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> jsquyres_at_[hidden]
>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> | Teng Ma Univ. of Tennessee |
> | tma_at_[hidden] Knoxville, TN |
> | http://web.eecs.utk.edu/~tma/ |
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>