On Mon, Apr 26, 2010 at 8:01 PM, Ashley
Pittman
<ashley@pittman.co.uk>
wrote:
On 25 Apr 2010, at 22:27, Asad Ali wrote:
> Yes I use different machines such as
>
> machine 1 uses AMD Opterons. (Fedora)
>
> machine 2 and 3 use Intel Xeons. (CentOS)
>
> machine 4 uses slightly older Intel Xeons. (Debian)
>
> Only machine 1 gives correct results. While CentOS and Debian
results are same but are wrong and different from those of machine 1.
Have you verified the are actually wrong or are they just different?
It's actually perfectly possible for the same program to get different
results from run to run even on the same hardware and the same OS. All
floating point operations by the MPI library are expected to be
deterministic but changing the process layout or and MPI settings can
affect this and of course anything the application does can introduce
differences as well.
Ashley.
The code is the same with the same input/output and the same constants
etc. From run to run the results can only be different if you either
use different input/output or use different random number seeds. Here
in my case the random number seeds are the same as well. This means
that this code must give (and it does) the same results no matter how
many times you run it. I didn't tamper with mpi-settings for any run. I
have verified that results of only Fedora are correct because I know
what is in my data and how should my model behave and I get a nearly
perfect convergence on Fedora OS. Even my dual core laptop with Ubuntu
9.10 also gives correct results. The other OSs give the same results
for a few hundred iterations as Fedora but then an unusual thing
happens and the results start getting wrong.