Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Asynchronous behaviour of MPI Collectives
From: Igor Kozin (i.n.kozin_at_[hidden])
Date: 2009-01-23 08:05:30


what is your message size and the number of cores per node?
is there any difference using different algorithms?

2009/1/23 Gabriele Fatigati <g.fatigati_at_[hidden]>

> Hi Jeff,
> i would like to understand why, if i run over 512 procs or more, my
> code stops over mpi collective, also with little send buffer. All
> processors are locked into call, doing nothing. But, if i add
> MPI_Barrier after MPI collective, it works! I run over Infiniband
> net.
>
> I know many people with this strange problem, i think there is a
> strange interaction between Infiniband and OpenMPI that causes it.
>
>
>
> 2009/1/23 Jeff Squyres <jsquyres_at_[hidden]>:
> > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
> >
> >> I've noted that OpenMPI has an asynchronous behaviour in the collective
> >> calls.
> >> The processors, doesn't wait that other procs arrives in the call.
> >
> > That is correct.
> >
> >> This behaviour sometimes can cause some problems with a lot of
> >> processors in the jobs.
> >
> > Can you describe what exactly you mean? The MPI spec specifically allows
> > this behavior; OMPI made specific design choices and optimizations to
> > support this behavior. FWIW, I'd be pretty surprised if any optimized
> MPI
> > implementation defaults to fully synchronous collective operations.
> >
> >> Is there an OpenMPI parameter to lock all process in the collective
> >> call until is finished? Otherwise i have to insert many MPI_Barrier
> >> in my code and it is very tedious and strange..
> >
> > As you have notes, MPI_Barrier is the *only* collective operation that
> MPI
> > guarantees to have any synchronization properties (and it's a fairly weak
> > guarantee at that; no process will exit the barrier until every process
> has
> > entered the barrier -- but there's no guarantee that all processes leave
> the
> > barrier at the same time).
> >
> > Why do you need your processes to exit collective operations at the same
> > time?
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
>
> --
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>