Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?
From: Reuti (reuti_at_[hidden])
Date: 2011-09-20 07:25:28


Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:

> I am observing differences in floating-point results from an application program that appear to be related to whether I link with OpenMPI 1.4.3 or MVAPICH 1.2.0. Both packages were built with the same installation of Intel 11.1, as well as the application program; identical flags passed to the compiler in each case.
> I’ve tracked down some differences in a compute-only routine where I’ve printed out the inputs to the routine (to 18 digits) ; the inputs are identical. The output numbers are different in the 16th place (perhaps a few in the 15th place). These differences only show up for optimized code, not for –O0.
> My assumption is that some optimized math intrinsic is being replaced dynamically, but I do not know how to confirm this. Anyone have guidance to offer? Or similar experience?

yes, I face it often but always at a magnitude where it's not of any concern (and not related to any MPI). Due to the limited precision in computers, a simple reordering of operation (although being equivalent in a mathematical sense) can lead to different results. Removing the anomalies with -O0 could proof that.

The other point I heard especially for the x86 instruction set is, that the internal FPU has still 80 bits, while the presentation in memory is only 64 bit. Hence when all can be done in the registers, the result can be different compared to the case when some interim results need to be stored to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it to stay always in 64 bit, and a similar one for Intel is -mp (now -fltconsistency) and -mp1. (page 42) (page 260)

You could try with the mentioned switches whether you get more consistent output.

If there would be a MPI ABI, and you could just drop in any MPI library, it would be quite easy to spot the real point where the discrepancy occured.

-- Reuti

> Thanks very much
> Ed
> Just for what it’s worth, here’s the output of ldd:
> % ldd application_mvapich
> => (0x00007fffe3746000)
> => /usr/lib64/ (0x00002b5b45fc1000)
> => /usr/mpi/intel/mvapich-1.2.0/lib/shared/ (0x00002b5b462cd000)
> => /usr/lib64/ (0x00002b5b465ed000)
> => /usr/lib64/ (0x00002b5b467fc000)
> => /lib64/ (0x00002b5b46a04000)
> => /lib64/ (0x00002b5b46c21000)
> => /lib64/ (0x00002b5b46e2a000)
> => /lib64/ (0x00002b5b47081000)
> => /lib64/ (0x00002b5b47285000)
> => /lib64/ (0x00002b5b475e3000)
> /lib64/ (0x00002b5b45da0000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b5b477fb000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b5b47b8f000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b5b47da5000)
> % ldd application_openmpi
> => (0x00007fff6ebff000)
> => /usr/lib64/ (0x00002b6e7c17d000)
> => /usr/mpi/intel/openmpi-1.4.3/lib64/ (0x00002b6e7c489000)
> => /usr/mpi/intel/openmpi-1.4.3/lib64/ (0x00002b6e7c68d000)
> => /usr/mpi/intel/openmpi-1.4.3/lib64/ (0x00002b6e7c8ca000)
> => /usr/mpi/intel/openmpi-1.4.3/lib64/ (0x00002b6e7cb9c000)
> => /usr/mpi/intel/openmpi-1.4.3/lib64/ (0x00002b6e7ce01000)
> => /lib64/ (0x00002b6e7d077000)
> => /lib64/ (0x00002b6e7d27c000)
> => /lib64/ (0x00002b6e7d494000)
> => /lib64/ (0x00002b6e7d697000)
> => /lib64/ (0x00002b6e7d8ee000)
> => /lib64/ (0x00002b6e7db0b000)
> => /lib64/ (0x00002b6e7de69000)
> /lib64/ (0x00002b6e7bf5c000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b6e7e081000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b6e7e1ba000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b6e7e45f000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b6e7e7f4000)
> => /opt/intel/Compiler/11.1/072/lib/intel64/ (0x00002b6e7ea0a000)
> _______________________________________________
> users mailing list
> users_at_[hidden]