This problem went away yesterday. There was no intervening reboot of
my cluster or a recompile of the code. So all I can surmise is
something got cleaned up in a cron script. Wierd.
Anyways, now I've benchmarked the HPL using OpenMPI vs LAM-MPI. The
OpenMPI runs about 13% to sometimes 50% slower than the LAM-MPI. I'm
running over TCP and using SSH.
Does anyone expect OpenMPI to be slower than LAM-MPI under these
On Apr 10, 2006, at 9:57 PM, Lee D. Peterson wrote:
> Dear OpenMPI,
> I'm transitioning from LAM-MPI to OpenMPI and have just compiled
> OMPI 1.0.2 on OS X server 10.4.6. I'm using gcc 3.3 and XLF (both
> f77 and f90), and I'm using ssh to run the jobs. My cluster is all
> G5 dual 2GHz+ xserves, and I am using both ethernet ports for
> communication. One is used for NFS and the other is for MPI.
> I've had few problems the past year running this config with LAM-
> MPI (latest release). But what worked before doesn't with OpenMPI
> When I run any parallel job that spans multiple machines, the
> process runs indefinitely. I've checked this using the BLACS and
> PBLAS test routines, the HPL benchmark, and even a simple mpi-pong
> program. All of them execute but produce no output past some
> initial output, consuming 100% of the CPU on every node that's
> launched. In contrast, all of these programs run in a few seconds
> on a single node, with two processors, and up to -np 8. When I
> cntrl-C to stop the program, openmpi safely stops all the
> processes, no matter how many machines have been used.
> I noticed a couple postings from the past few months that seem to
> be related but didn't seem to be quite the same symptoms. Any ideas
> what could be going on?
> OpenMPI is a really great project, and it is obvious the quality of
> software development that has gone into it. I appreciate all your
> help. My config.log and omni-info.out files are attached.
> Lee Peterson
> Aerospace Engineering Sciences
> University of Colorado
> Boulder, CO
> users mailing list