Thank you for opening a ticket and taking care of this.
Jeff Squyres wrote:
> On Jul 28, 2010, at 5:07 PM, Gus Correa wrote:
>> Still, the alignment under Intel may or may not be right.
>> And this may or may not explain the errors that Hugo has got.
>> FYI, the ompi_info from my OpenMPI 1.3.2 and 1.2.8
>> report exactly the same as OpenMPI 1.4.2, namely
>> Fort dbl prec size: 4 and
>> Fort dbl prec align: 4,
>> except that *if the Intel Fortran compiler (ifort) was used*
>> I get 1 byte alignment:
>> Fort dbl prec align: 1
>> So, this issue has been around for a while,
>> and involves both the size and the alignment (in Intel)
>> of double precision.
> Yes, it's quite problematic to try to determine the alignment
> of Fortran types -- compilers can do different things
> and there's no reliable way (that I know of, at least)
> to absolutely get the "native" alignment.
I can imagine this is not easy, specially with the large variety
of architectures, compilers, and environments, that OpenMPI handles.
> That being said, we didn't previously find any correctness
> issues with using an alignment of 1.
Does it affect only the information
provided by ompi_info, as Martin Siegert suggested?
Or does it really affect the actual alignment of
MPI types when OpenMPI is compiled with Intel,
as Martin, Ake Sandgren, Hugo Gagnon, and myself
thought it might?
>> We have a number of pieces of code here where grep shows
>> Not sure how much of it has actually been active, as there are always
>> lots of cpp directives to select active code.
>> In particular I found this interesting snippet:
>> if (MPI_DOUBLE_PRECISION==20 .and. MPI_REAL8==18) then
>> ! LAM MPI defined MPI_REAL8 differently from MPI_DOUBLE_PRECISION
>> ! and LAM MPI's allreduce does not accept on MPI_REAL8
>> MPIreal_t = MPI_DOUBLE_PRECISION
>> MPIreal_t = MPI_REAL8
> This kind of thing shouldn't be an issue with Open MPI, right?
Yes, you are right.
Actually, I checked (and wrote in my posting)
that OpenMPI MPI_DOUBLE_PRECISION = 17, hence the code above
boils down to redefining everything as MPI_REAL8 instead
(the "else" part), hence MPI_DOUBLE_PRECISION
is never actually used *in this source file*.
BTW, I didn't write this code or the comments.
The source file is part of CCSM4/CAM4,
a widely used public domain big climate/atmosphere model:
This particular source file (parallel_mod.F90, circa line 169)
hasn't been used in previous incarnations of these programs
(CAM3/CCSM3), which we ran extensively here, using OpenMPI.
In the old CAM3/CCSM3 most (perhaps all) of the 8-byte
floating point data are declared as real*8 or with the "kind" attribute,
not as double precision.
However, not only this source file, but many other source files
in the new CCSM4/CAM4 declare 8-byte floating point data
as double precision,
and utilize MPI_DOUBLE_PRECISION in MPI function calls.
Despite this style being a bit outdated,
as Fortran90 seems to prefer to replace "double precision"
by "real, kind(0.d0)", as Hugo did in his example.
My concern is because we just started experimenting with CAM4/CCSM4,
and the plan was to use OpenMPI libraries compiled with Intel.
> FWIW, OMPI uses different numbers for MPI_DOUBLE_PRECISION and MPI_REAL8
> than LAM. They're distinct MPI datatypes because they *could* be different.
Yes, I understand two different MPI_ constants should be kept,
although the actual values of their size and alignment may be the same
in specific architectures (e.g. x86_64).