Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI_Reduce performance
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2010-09-08 05:21:49


Dear OpenMPI users,

i'm using OpenMPI 1.3.3 on Infiniband 4x interconnnection network. My
parallel application use intensive MPI_Reduce communication over
communicator created with MPI_Comm_split.

I've noted strange behaviour during execution. My code is instrumented with
Scalasca 1.3 to report subroutine execution time. First execution shows
elapsed time with 128 processors ( job_communicator is created with
MPI_Comm_split). In both cases is composed to the same ranks of
MPI_COMM_WORLD:

MPI_Reduce(.....,job_communicator)

The elapsed time is 2671 sec.

Second run use MPI_BARRIER before MPI_Reduce:

MPI_Barrier(job_communicator..)
MPI_Reduce(.....,job_communicator)

The elapsed time of Barrier+Reduce is 2167 sec, (about 8 minutes less).

So, im my opinion, it is better put MPI_Barrier before any MPI_Reduce to
mitigate "asynchronous" behaviour of MPI_Reduce in OpenMPI. I suspect the
same for others collective communications. Someone can explaine me why
MPI_reduce has this strange behaviour?

Thanks in advance.

-- 
Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it