We started having a problem with OpenMPI beginning with version 1.3.2 where the program output can be correct, junk, or NaNs (result is not predictable). The output is the solution of a matrix equation solved by ScaLAPACK. We are using the Intel Fortran compiler (version 11.1) and the GNU compiler (version 4.1.2) on Gentoo Linux. So far, the problem manifests itself for a matrix (N X N) with N ~ 10,000 or more with a processor count ~ 64 or more. Note that the problem still occurs using OpenMPI 1.4.1.

 

We build the ScaLAPACK and BLACS libraries locally and use the LAPACK and BLAS libraries supplied by Intel.

 

We wrote a test program to demonstrate the problem. The matrix is built on each processor (no communication). Then, the matrix is factored and solved. The solution vector is collected from the processors and printed to a file by the master processor. The program and associated OpenMPI information (ompi_info --all) are available at:

 

http://www.em-stuff.com/files/files.tar.gz

 

The file "compile" in the "test" directory is used to create the executable. Edit it to reflect libraries on your local machine. Data created using OpenMPI 1.3.1 and 1.4.1 are in the "output" directory for reference.

 

We appreciate any help.

 

Thanks,

Nathan