Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] open-mpi behaviour on Fedora, Ubuntu, Debian and CentOS
From: Gus Correa (gus_at_[hidden])
Date: 2010-04-26 12:54:15


Dave Love wrote:
> Asad Ali <asad06_at_[hidden]> writes:
>
>> >From run to run the results can only be different if you either use
>> different input/output or use different random number seeds. Here in my case
>> the random number seeds are the same as well.
>
> Sorry, but that's naïve, even if you can prove your code is well-defined
> according to the language and floating-point standards. You should
> listen to Ashley, and if it worries you, you really need just to debug
> it. If you believe it's a problem with open-mpi, you at least have to
> demonstrate results with a different MPI.

Or run a serial version on the same set of machines,
compiled in similar ways (compiler version, opt flags, etc)
to the parallel versions, and compare results.
If the results don't differ, then you can start blaming MPI.

In my experience, most of the time MPI doesn't
contribute significantly or at all to the numerical
difference in results.
On the other hand, compiler flags (particularly optimization),
compiler versions, different hardware,
different OS, different libraries (e.g. math libraries),
have a significant effect.

Bit-by-bit matching can be hardly achieved in complex programs.
It is a chimera.
You only give it a chance if you enforce IEEE standard
(which somebody already suggested to you),
and hope that the compiler really does it right.
However, beware that enforcing IEEE standard brings along a performance
penalty.

Well designed algorithms are also important.
There are some old famous (infamous?) FFTs still in use out there
that can boost your round-off errors in a few iterations.
On different hardware, or with different optimization flags,
the error amplification can differ also.

We run many complex programs that produce results that differ slightly.
The good ones produce differences at the round-off level.
But the world is not always so good.

I hope this helps.
Gus Correa

>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users