Hi Santhosh,

 

Numeric differences are to be expected with parallel applications. The basic reason for that is that on many architectures floating-point operations are performed using higher internal precision than that of the arguments and only the final result is rounded back to the lower output precision. When performing the same operation in parallel, intermediate results are communicated using the lower precision and thus the final result could differ. How much it would differ depends on the stability of the algorithm - it could be a slight difference in the last 1-2 significant bits or it could be a completely different result (e.g. when integrating chaotic dynamic systems).

 

In your particular case with one process the MPI_Reduce is actually reduced to a no-op and the summing is done entirely in the preceding loop. With two processes the sum is broken into two parts which are computed with higher precision but converted to float before being communicated.

 

You could try to “cure” this (non-problem) by telling your compiler to not use higher precision for intermediate results.

 

Hope that helps,

Hristo

--

Hristo Iliev, Ph.D. -- High Performance Computing

RWTH Aachen University, Center for Computing and Communication

Rechen- und Kommunikationszentrum der RWTH Aachen

Seffenter Weg 23,  D 52074  Aachen (Germany)

 

From: devel-bounces@open-mpi.org [mailto:devel-bounces@open-mpi.org] On Behalf Of Santhosh Kokala
Sent: Monday, October 15, 2012 8:07 AM
To: Open MPI Developers
Subject: [OMPI devel] MPI_Reduce() is losing precision

 

Hi All,

I am having a strange problem with the floating precision. I get correct precision when I launch with one process, but when the same code is launched with 2 or more process I am losing precision in MPI_Redcue(…, MPI_FLOAT, MPI_SUM..); call. Output from my code

 

(admin)host:~$ mpirun -np 1 string 10 0.1 0.9 10 3

sum = 1

sum = 0.999992

sum = 1.00043

 

(admin)host:~$ mpirun -np 2 string 10 0.1 0.9 10 3

sum = 1

sum = 1

sum = 1.00049

 

As you can see I am loosing precision. Can someone help me fix this code? Last parameter to my code is the number of iterations. I am attaching source code to this email.

 

Santhosh