Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Asynchronous behaviour of MPI Collectives
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2009-01-23 07:58:51

Hi Jeff,
i would like to understand why, if i run over 512 procs or more, my
code stops over mpi collective, also with little send buffer. All
processors are locked into call, doing nothing. But, if i add
MPI_Barrier after MPI collective, it works! I run over Infiniband

I know many people with this strange problem, i think there is a
strange interaction between Infiniband and OpenMPI that causes it.

2009/1/23 Jeff Squyres <jsquyres_at_[hidden]>:
> On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
>> I've noted that OpenMPI has an asynchronous behaviour in the collective
>> calls.
>> The processors, doesn't wait that other procs arrives in the call.
> That is correct.
>> This behaviour sometimes can cause some problems with a lot of
>> processors in the jobs.
> Can you describe what exactly you mean? The MPI spec specifically allows
> this behavior; OMPI made specific design choices and optimizations to
> support this behavior. FWIW, I'd be pretty surprised if any optimized MPI
> implementation defaults to fully synchronous collective operations.
>> Is there an OpenMPI parameter to lock all process in the collective
>> call until is finished? Otherwise i have to insert many MPI_Barrier
>> in my code and it is very tedious and strange..
> As you have notes, MPI_Barrier is the *only* collective operation that MPI
> guarantees to have any synchronization properties (and it's a fairly weak
> guarantee at that; no process will exit the barrier until every process has
> entered the barrier -- but there's no guarantee that all processes leave the
> barrier at the same time).
> Why do you need your processes to exit collective operations at the same
> time?
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> users mailing list
> users_at_[hidden]

Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy                    Tel:   +39 051 6171722
g.fatigati [AT]