On Oct 15 2012, Iliev, Hristo wrote:
>
> Numeric differences are to be expected with parallel applications. The
> basic reason for that is that on many architectures floating-point
> operations are performed using higher internal precision than that of the
> arguments and only the final result is rounded back to the lower output
> precision. When performing the same operation in parallel, intermediate
> results are communicated using the lower precision and thus the final
> result could differ. ...
Not quite. That's ONE reason.
> You could try to "cure" this (non-problem) by telling your compiler to not
> use higher precision for intermediate results.
But it wouldn't help if the problem is the other reason, which is that
floating-point arithmetic is not associative. That means that the actual
order of the operations makes a difference to the final result, and that
is (correctly) unspecified for MPI_Reduce.
I have had long arguments with people who believe in deterministic
floating-point (i.e. that consistency implies correctness), but the
actual fact is that it is an unavoidable problem with parallel use of
floating-point or indeed any serious numeric optimisation.
So the summary is that anyone doing floating-point work has to learn
to live with it. Any traditional book on numerical programming (i.e.
before 1980) will take that for granted.
Regards,
Nick Maclaren.
|