We started having a problem with
OpenMPI beginning with version 1.3.2 where the program output can be correct,
junk, or NaNs (result is not predictable). The output is the solution of a
matrix equation solved by ScaLAPACK. We are using the Intel Fortran compiler
(version 11.1) and the GNU compiler (version 4.1.2) on Gentoo Linux. So far,
the problem manifests itself for a matrix (N X N) with N ~ 10,000 or more with
a processor count ~ 64 or more. Note that the problem still occurs using
OpenMPI 1.4.1.
We build the ScaLAPACK and BLACS
libraries locally and use the LAPACK and BLAS libraries supplied by Intel.
We wrote a test program to demonstrate
the problem. The matrix is built on each processor (no communication). Then,
the matrix is factored and solved. The solution vector is collected from the
processors and printed to a file by the master processor. The program and
associated OpenMPI information (ompi_info --all) are available at:
http://www.em-stuff.com/files/files.tar.gz
The file "compile" in the
"test" directory is used to create the executable. Edit it to reflect
libraries on your local machine. Data created using OpenMPI 1.3.1 and 1.4.1 are
in the "output" directory for reference.
We appreciate any help.
Thanks,
Nathan