Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Asynchronous behaviour of MPI Collectives
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2009-01-23 08:09:23


Hi Igor,
My message size is 4096kb and i have 4 procs per core.
There isn't any difference using different algorithms..

2009/1/23 Igor Kozin <i.n.kozin_at_[hidden]>:
> what is your message size and the number of cores per node?
> is there any difference using different algorithms?
>
> 2009/1/23 Gabriele Fatigati <g.fatigati_at_[hidden]>
>>
>> Hi Jeff,
>> i would like to understand why, if i run over 512 procs or more, my
>> code stops over mpi collective, also with little send buffer. All
>> processors are locked into call, doing nothing. But, if i add
>> MPI_Barrier after MPI collective, it works! I run over Infiniband
>> net.
>>
>> I know many people with this strange problem, i think there is a
>> strange interaction between Infiniband and OpenMPI that causes it.
>>
>>
>>
>> 2009/1/23 Jeff Squyres <jsquyres_at_[hidden]>:
>> > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote:
>> >
>> >> I've noted that OpenMPI has an asynchronous behaviour in the collective
>> >> calls.
>> >> The processors, doesn't wait that other procs arrives in the call.
>> >
>> > That is correct.
>> >
>> >> This behaviour sometimes can cause some problems with a lot of
>> >> processors in the jobs.
>> >
>> > Can you describe what exactly you mean? The MPI spec specifically
>> > allows
>> > this behavior; OMPI made specific design choices and optimizations to
>> > support this behavior. FWIW, I'd be pretty surprised if any optimized
>> > MPI
>> > implementation defaults to fully synchronous collective operations.
>> >
>> >> Is there an OpenMPI parameter to lock all process in the collective
>> >> call until is finished? Otherwise i have to insert many MPI_Barrier
>> >> in my code and it is very tedious and strange..
>> >
>> > As you have notes, MPI_Barrier is the *only* collective operation that
>> > MPI
>> > guarantees to have any synchronization properties (and it's a fairly
>> > weak
>> > guarantee at that; no process will exit the barrier until every process
>> > has
>> > entered the barrier -- but there's no guarantee that all processes leave
>> > the
>> > barrier at the same time).
>> >
>> > Why do you need your processes to exit collective operations at the same
>> > time?
>> >
>> > --
>> > Jeff Squyres
>> > Cisco Systems
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> Parallel programmer
>>
>> CINECA Systems & Tecnologies Department
>>
>> Supercomputing Group
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it