Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Reduce() is losing precision
From: N.M. Maclaren (nmm1_at_[hidden])
Date: 2012-10-15 05:45:03


On Oct 15 2012, Iliev, Hristo wrote:
>
> Numeric differences are to be expected with parallel applications. The
> basic reason for that is that on many architectures floating-point
> operations are performed using higher internal precision than that of the
> arguments and only the final result is rounded back to the lower output
> precision. When performing the same operation in parallel, intermediate
> results are communicated using the lower precision and thus the final
> result could differ. ...

Not quite. That's ONE reason.

> You could try to "cure" this (non-problem) by telling your compiler to not
> use higher precision for intermediate results.

But it wouldn't help if the problem is the other reason, which is that
floating-point arithmetic is not associative. That means that the actual
order of the operations makes a difference to the final result, and that
is (correctly) unspecified for MPI_Reduce.

I have had long arguments with people who believe in deterministic
floating-point (i.e. that consistency implies correctness), but the
actual fact is that it is an unavoidable problem with parallel use of
floating-point or indeed any serious numeric optimisation.

So the summary is that anyone doing floating-point work has to learn
to live with it. Any traditional book on numerical programming (i.e.
before 1980) will take that for granted.

Regards,
Nick Maclaren.