I'm trying to run a small "proof of concept" program using OpenMPI 1.3. I am using Solaris 8 with Sparc processors across 2 nodes. It appears that the MPI_Reduce function is hanging. If I run the same program with only 4 instances on 1 node , or 2 instances on 2 nodes, it works fine. The problem is visible with 4 instances on 2 nodes.
First, I had some issues while compiling OpenMPI. I did resolve my compile-time issues, so I would like to share with you my fixes. I believe that my compile-time issues are related to running an older version of Solaris, and probably not due to any major issue in OpenMPI. These fixes are not related to my problem, but thought you might need to see this in case it provides insight onto what my problem is.
_SC_CPUID_MAX was undefined. I made the following change to 2 locations in the source:
cpuid_max = 7; /* sysconf(_SC_CPUID_MAX); */ /* Running on 8 CPU nodes */
vfscan was undefined. I had to comment out the following code (it appears that fscanf() was not required anyway):
a) /* #include <stdint.h> */
b) /* VT_IOWRAP_INIT_FUNC(fscanf); */
c) I commented out the entire fscanf() function
Now, I seem to be stuck on a run-time issue. I wrote a program (located in the attached bz2 file) called sieve.c which calculates prime numbers using the sieve algorithm (copied the code from somewhere). When I run the program on a local node only with 4 threads it works fine. If I run the program with 2 threads on 2 nodes, it also works fine. If I run the program with 4 threads on 2 nodes, it hangs. I made the following observations:
1) It is definitely hanging during the call to MPI_Reduce().
2) Some instances do exit MPI_Reduce(), while other instances enter but never exit this function.
3) If I added the following code right before calling MPI_Reduce(), the problem went away. It appears that by delaying the destination instance of the reduce operation from making the call, it seems to work. However, I do realize this is a kludge and that it is no guarantee that it will work all the time.
4) If I changed the MPI_Reduce() to an MPI_Allreduce(), the sieve program also works with 4 instances across 2 nodes.
I did search your archives, and found someone else with a similar issue, but I didn't see any response.
My PATH includes:
My LD_LIBRARY_PATH includes:
I used the following in my configure parameters:
./configure --prefix=/home/username/mpi/openmpi-1.3.local --disable-mpi-f77 --disable-mpi-f90 CFLAGS=-xarch=v8plus CXXFLAGS=-xarch=v8plus
I compiled the program with:
mpicc -g -o sieve sieve.c
I ran the program with:
mpirun -np 4 -H node1,node2 sieve 100
Please let me know if you need any additional information. And thanks in advance for any help you can provide.