Sorry for the delay, I am now able to reproduce this behavior when I
do not specify HPL_NO_DATATYPE. If I do specify HPL_NO_DATATYPE the
run completes. We will be looking into this now.
On Oct 21, 2005, at 5:03 PM, Troy Telford wrote:
> I've been trying out the RC4 builds of OpenMPI; I've been using
> (gm), Infiniband (mvapi), and TCP.
> When running a benchmark such as IMB (formerly PALLAS, IIRC), or
> even a
> simple hello world, there are no problems.
> However, when running HPL (and HPCC, which is a superset of HPL), I
> run into a problem: When running HPL (or when the execution
> reaches the
> HPL portion of HPCC), the process seems to get wedged...
> I have no problems compiling and building HPL and HPCC for MPICH
> ( including MVAPICH, MPICH-GM/MX) and LAM; no problems with the gcc,
> Intel, PGI, or Pathscale compilers.
> The HPL.dat (and hpccinf.txt) can be identical across the
> machines. The
> machines are identically configured (except for the interconnect).
> However, when running the HPL code (on OpenMPI), HPL will peg the
> and run until I feel like killing it.. If the 'N' size is larger
> than a
> fraction of a percent of free system memory (0.1% of free memory;
> has 2 GB/CPU, in my case), HPL and HPCC will not finish computing that
> problem size. (Case in point -- a N size that is small enough that it
> takes 1-2 seconds with MPICH, MPICH-GM, MVAPICH, or LAM -- doesn't
> complete after several minutes on OpenMPI)
> I'm therefore, somewhat confused; I've seen posts from people who
> claim to
> have run HPL with OpenMPI. I've had no issues running other
> benchmarks on
> OpenMPI; but HPL-based code seems to wedge itself... The behavior is
> consistent when I use Myrinet, Infiniband, or Ethernet.
> I am running OpenMPI on Linux (SuSE Enterprise 9, SP2, x86_64).
> Dual-Opteron 248; 2 GB/CPU
> users mailing list