Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Asynchronous behaviour of MPI Collectives
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-01-23 08:50:02


This is with the 1.2 series, right?

Have you tried using what is described here:

     http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion

I don't know if you can try OMPI v1.3 or not, but the issue described
in the the above FAQ item is fixed properly in the OMPI v1.3 series
(i.e., that MCA parameter is unnecessary because we fixed it a
different way).

FWIW, if adding an MPI_Barrier is the difference between hanging and
not hanging, it sounds like an Open MPI bug. You should never need to
add an MPI_Barrier to make an MPI program correct.

On Jan 23, 2009, at 8:09 AM, Gabriele Fatigati wrote:

> Hi Igor,
> My message size is 4096kb and i have 4 procs per core.
> There isn't any difference using different algorithms..
>
> 2009/1/23 Igor Kozin <i.n.kozin_at_[hidden]>:
>> what is your message size and the number of cores per node?
>> is there any difference using different algorithms?
>>
>> 2009/1/23 Gabriele Fatigati <g.fatigati_at_[hidden]>
>>>
>>> Hi Jeff,
>>> i would like to understand why, if i run over 512 procs or more, my
>>> code stops over mpi collective, also with little send buffer. All
>>> processors are locked into call, doing nothing. But, if i add
>>> MPI_Barrier after MPI collective, it works! I run over Infiniband
>>> net.
>>>
>>> I know many people with this strange problem, i think there is a
>>> strange interaction between Infiniband and OpenMPI that causes it.
>>>
>>>
>>>
>>> 2009/1/23 Jeff Squyres <jsquyres_at_[hidden]>:
>>>> On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
>>>>
>>>>> I've noted that OpenMPI has an asynchronous behaviour in the
>>>>> collective
>>>>> calls.
>>>>> The processors, doesn't wait that other procs arrives in the call.
>>>>
>>>> That is correct.
>>>>
>>>>> This behaviour sometimes can cause some problems with a lot of
>>>>> processors in the jobs.
>>>>
>>>> Can you describe what exactly you mean? The MPI spec specifically
>>>> allows
>>>> this behavior; OMPI made specific design choices and
>>>> optimizations to
>>>> support this behavior. FWIW, I'd be pretty surprised if any
>>>> optimized
>>>> MPI
>>>> implementation defaults to fully synchronous collective operations.
>>>>
>>>>> Is there an OpenMPI parameter to lock all process in the
>>>>> collective
>>>>> call until is finished? Otherwise i have to insert many
>>>>> MPI_Barrier
>>>>> in my code and it is very tedious and strange..
>>>>
>>>> As you have notes, MPI_Barrier is the *only* collective operation
>>>> that
>>>> MPI
>>>> guarantees to have any synchronization properties (and it's a
>>>> fairly
>>>> weak
>>>> guarantee at that; no process will exit the barrier until every
>>>> process
>>>> has
>>>> entered the barrier -- but there's no guarantee that all
>>>> processes leave
>>>> the
>>>> barrier at the same time).
>>>>
>>>> Why do you need your processes to exit collective operations at
>>>> the same
>>>> time?
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Ing. Gabriele Fatigati
>>>
>>> Parallel programmer
>>>
>>> CINECA Systems & Tecnologies Department
>>>
>>> Supercomputing Group
>>>
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>
>>> www.cineca.it Tel: +39 051 6171722
>>>
>>> g.fatigati [AT] cineca.it
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems