Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Asynchronous behaviour of MPI Collectives
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2009-01-23 09:12:19


Thanks Jeff,
i'll try this flag.

Regards.

2009/1/23 Jeff Squyres <jsquyres_at_[hidden]>:
> This is with the 1.2 series, right?
>
> Have you tried using what is described here:
>
>
> http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion
>
> I don't know if you can try OMPI v1.3 or not, but the issue described in the
> the above FAQ item is fixed properly in the OMPI v1.3 series (i.e., that MCA
> parameter is unnecessary because we fixed it a different way).
>
> FWIW, if adding an MPI_Barrier is the difference between hanging and not
> hanging, it sounds like an Open MPI bug. You should never need to add an
> MPI_Barrier to make an MPI program correct.
>
>
>
> On Jan 23, 2009, at 8:09 AM, Gabriele Fatigati wrote:
>
>> Hi Igor,
>> My message size is 4096kb and i have 4 procs per core.
>> There isn't any difference using different algorithms..
>>
>> 2009/1/23 Igor Kozin <i.n.kozin_at_[hidden]>:
>>>
>>> what is your message size and the number of cores per node?
>>> is there any difference using different algorithms?
>>>
>>> 2009/1/23 Gabriele Fatigati <g.fatigati_at_[hidden]>
>>>>
>>>> Hi Jeff,
>>>> i would like to understand why, if i run over 512 procs or more, my
>>>> code stops over mpi collective, also with little send buffer. All
>>>> processors are locked into call, doing nothing. But, if i add
>>>> MPI_Barrier after MPI collective, it works! I run over Infiniband
>>>> net.
>>>>
>>>> I know many people with this strange problem, i think there is a
>>>> strange interaction between Infiniband and OpenMPI that causes it.
>>>>
>>>>
>>>>
>>>> 2009/1/23 Jeff Squyres <jsquyres_at_[hidden]>:
>>>>>
>>>>> On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
>>>>>
>>>>>> I've noted that OpenMPI has an asynchronous behaviour in the
>>>>>> collective
>>>>>> calls.
>>>>>> The processors, doesn't wait that other procs arrives in the call.
>>>>>
>>>>> That is correct.
>>>>>
>>>>>> This behaviour sometimes can cause some problems with a lot of
>>>>>> processors in the jobs.
>>>>>
>>>>> Can you describe what exactly you mean? The MPI spec specifically
>>>>> allows
>>>>> this behavior; OMPI made specific design choices and optimizations to
>>>>> support this behavior. FWIW, I'd be pretty surprised if any optimized
>>>>> MPI
>>>>> implementation defaults to fully synchronous collective operations.
>>>>>
>>>>>> Is there an OpenMPI parameter to lock all process in the collective
>>>>>> call until is finished? Otherwise i have to insert many MPI_Barrier
>>>>>> in my code and it is very tedious and strange..
>>>>>
>>>>> As you have notes, MPI_Barrier is the *only* collective operation that
>>>>> MPI
>>>>> guarantees to have any synchronization properties (and it's a fairly
>>>>> weak
>>>>> guarantee at that; no process will exit the barrier until every process
>>>>> has
>>>>> entered the barrier -- but there's no guarantee that all processes
>>>>> leave
>>>>> the
>>>>> barrier at the same time).
>>>>>
>>>>> Why do you need your processes to exit collective operations at the
>>>>> same
>>>>> time?
>>>>>
>>>>> --
>>>>> Jeff Squyres
>>>>> Cisco Systems
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ing. Gabriele Fatigati
>>>>
>>>> Parallel programmer
>>>>
>>>> CINECA Systems & Tecnologies Department
>>>>
>>>> Supercomputing Group
>>>>
>>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>>
>>>> www.cineca.it Tel: +39 051 6171722
>>>>
>>>> g.fatigati [AT] cineca.it
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> Parallel programmer
>>
>> CINECA Systems & Tecnologies Department
>>
>> Supercomputing Group
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it