Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_BARRIER
From: Ghislain Lartigue (ghislain.lartigue_at_[hidden])
Date: 2011-09-08 09:41:57


These "times" have no units, it's just an example...
Whatever units are used, at least one process should spend a very small of time in the barrier (compared to the other processes) and this is not what I see in my code.

The network is supposed to be excellent: my machine is #9 in the top500 supercomputers... (http://top500.org/system/10589)

Ghislain.

Le 8 sept. 2011 à 15:34, Jeff Squyres a écrit :

> On Sep 8, 2011, at 9:17 AM, Ghislain Lartigue wrote:
>
>> Example with 3 processes:
>>
>> P0 hits barrier at t=12
>> P1 hits barrier at t=27
>> P2 hits barrier at t=41
>
> What is the unit of time here, and how well are these times synchronized?
>
>> In this situation:
>> P0 waits 41-12 = 29
>> P1 waits 41-27 = 14
>> P2 waits 41-41 = 00
>>
>> So I should see something like (no ordering is expected):
>> barrier_time = 14
>> barrier_time = 00
>> barrier_time = 29
>>
>> But what I see is much more like
>> barrier_time = 22
>> barrier_time = 29
>> barrier_time = 25
>>
>> See? No process has a barrier_time equal to zero !!!
>
> No process will ever have a *zero* time in a barrier; it's just not possible (unless you're measuring in seconds, or something very coarse grained?).
>
> What type of network are you using?
>
>>
>>
>> Le 8 sept. 2011 à 14:55, Jeff Squyres a écrit :
>>
>>> The order in which you see stdout printed from mpirun is not necessarily reflective of what order things were actually printers. Remember that the stdout from each MPI process needs to flow through at least 3 processes and potentially across the network before it is actually displayed on mpirun's stdout.
>>>
>>> MPI process -> local Open MPI daemon -> mpirun -> printed to mpirun's stdout
>>>
>>> Hence, the ordering of stdout can get transposed.
>>>
>>>
>>> On Sep 8, 2011, at 8:49 AM, Ghislain Lartigue wrote:
>>>
>>>> Thank you for this explanation but indeed this confirms that the LAST process that hits the barrier should go through nearly instantaneously (except for the broadcast time for the acknowledgment signal).
>>>> And this is not what happens in my code : EVERY process waits for a very long time before going through the barrier (thousands of times more than a broadcast)...
>>>>
>>>>
>>>> Le 8 sept. 2011 à 14:26, Jeff Squyres a écrit :
>>>>
>>>>> Order in which processes hit the barrier is only one factor in the time it takes for that process to finish the barrier.
>>>>>
>>>>> An easy way to think of a barrier implementation is a "fan in/fan out" model. When each nonzero rank process calls MPI_BARRIER, it sends a message saying "I have hit the barrier!" (it usually sends it to its parent in a tree of all MPI processes in the communicator, but you can simplify this model and consider that it sends it to rank 0). Rank 0 collects all of these messages. When it has messages from all processes in the communicator, it sends out "ok, you can leave the barrier now" messages (again, it's usually via a tree distribution, but you can pretend that it directly, linearly sends a message to each peer process in the communicator).
>>>>>
>>>>> Hence, the time that any individual process spends in the communicator is relative to when every other process enters the communicator. But it's also dependent upon communication speed, congestion in the network, etc.
>>>>>
>>>>>
>>>>> On Sep 8, 2011, at 6:20 AM, Ghislain Lartigue wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> at a given point in my (Fortran90) program, I write:
>>>>>>
>>>>>> ===================
>>>>>> start_time = MPI_Wtime()
>>>>>> call MPI_BARRIER(...)
>>>>>> new_time = MPI_Wtime() - start_time
>>>>>> write(*,*) "barrier time =",new_time
>>>>>> ==================
>>>>>>
>>>>>> and then I run my code...
>>>>>>
>>>>>> I expected that the values of "new_time" would range from 0 to Tmax (1700 in my case)
>>>>>> As I understand it, the first process that hits the barrier should print Tmax and the last process that hits the barrier should print 0 (or a very low value).
>>>>>>
>>>>>> But this is not the case: all processes print values in the range 1400-1700!
>>>>>>
>>>>>> Any explanation?
>>>>>>
>>>>>> Thanks,
>>>>>> Ghislain.
>>>>>>
>>>>>> PS:
>>>>>> This small code behaves perfectly in other parts of my code...
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> jsquyres_at_[hidden]
>>>>> For corporate legal information go to:
>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>