Doing Reduce without Barrier first allows one process to call Reduce and exit immediately without waiting for other processes to call Reduce. Therefore, this allows one process to advance faster than other processes. I suspect the 2671 second result is the difference between the fastest and slowest process. Having Barrier reduce the time difference because it forces the faster processes to go slower.
Dear OpenMPI users,i'm using OpenMPI 1.3.3 on Infiniband 4x interconnnection network. My parallel application use intensive MPI_Reduce communication over communicator created with MPI_Comm_split.I've noted strange behaviour during execution. My code is instrumented with Scalasca 1.3 to report subroutine execution time. First execution shows elapsed time with 128 processors ( job_communicator is created with MPI_Comm_split). In both cases is composed to the same ranks of MPI_COMM_WORLD:MPI_Reduce(.....,job_communicator)The elapsed time is 2671 sec.Second run use MPI_BARRIER before MPI_Reduce:MPI_Barrier(job_communicator..)MPI_Reduce(.....,job_communicator)The elapsed time of Barrier+Reduce is 2167 sec, (about 8 minutes less).So, im my opinion, it is better put MPI_Barrier before any MPI_Reduce to mitigate "asynchronous" behaviour of MPI_Reduce in OpenMPI. I suspect the same for others collective communications. Someone can explaine me why MPI_reduce has this strange behaviour?Thanks in advance.
Ing. Gabriele Fatigati
CINECA Systems & Tecnologies Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatigati [AT] cineca.it
users mailing list