Subject: Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
From: Gus Correa (gus_at_[hidden])
Date: 2009-05-01 17:12:02

Hi Ralph

Thank you very much for the prompt answer.
Sorry for being so confusing on my original message.

Yes, I am saying that the inclusion of openib is causing the difference
in behavior.
It runs with "sm,self", it fails with "openib,sm,self".
I am as puzzled as you are, because I thought the "openib" parameter
was simply ignored when running on a single node, exactly like you said.
After your message arrived, I ran HPL once more with "openib",
just in case.
Sure enough it failed just as I described.

And yes, all the procs run on a single node in both cases.
It doesn't seem to be a problem caused by a particular
node hardware either, as I already
tried three different nodes with similar results.

BTW, I successfully ran HPL across the whole cluster two days ago,
with IB ("openib,sm,self"),
but using a modest (for the cluster) problem size: N=50,000.
The total cluster memory is 24*16=384GB,
which gives a max HPL problem size N=195,000.
I have yet to try the large problem on the whole cluster,
but I am afraid I will stumble on the same memory problem.

Finally, on your email you use the syntax "btl=openib,sm,self",
with an "=" sign between the btl key and its values.
However, the mpiexec man page uses the syntax "btl openib,sm,self",
with a blank space between the btl key and its values.
I've been following the man page syntax.
The "=" sign doesn't seem to work, and aborts with the error:
"No executable was specified on the mpiexec command line.".
Could this possibly be the issue (say, wrong parsing of mca options)?

Many thanks!
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Ralph Castain wrote:
> If you are running on a single node, then btl=openib,sm,self would be
> equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if
> you are on a single node, and instead uses the shared memory subsystem.
> Are you saying that the inclusion of openib is causing a difference in
> behavior, even though all procs are on the same node??
> Just want to ensure I understand the problem.
> Thanks
> Ralph
> On Fri, May 1, 2009 at 11:16 AM, Gus Correa <gus_at_[hidden]
> <mailto:gus_at_[hidden]>> wrote:
> Hi OpenMPI and HPC experts
> This may or may not be the right forum to post this,
> and I am sorry to bother those that think it is not.
> I am trying to run the HPL benchmark on our cluster,
> compiling it with Gnu and linking to
> GotoBLAS (1.26) and OpenMPI (1.3.1),
> both also Gnu-compiled.
> I have got failures that suggest a memory leak when the
> problem size is large, but still within the memory limits
> recommended by HPL.
> The problem only happens when "openib" is among the OpenMPI
> MCA parameters (and the problem size is large).
> Any help is appreciated.
> Here is a description of what happens.
> For starters I am trying HPL on a single node, to get a feeling for
> the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
> AMD Opteron 2376 "Shanghai"
> The HPL recommendation is to use close to 80% of your physical memory,
> to reach top Gigaflop performance.
> Our physical memory on a node is 16GB, and this gives a problem size
> N=40,000 to keep the 80% memory use.
> I tried several block sizes, somewhat correlated to the size of the
> processor cache: NB=64 80 96 128 ...
> When I run HPL with N=20,000 or smaller all works fine,
> and the HPL run completes, regardless of whether "openib"
> is present or not on my MCA parameters.
> However, moving when I move N=40,000, or even N=35,000,
> the run starts OK with NB=64,
> but as NB is switched to larger values
> the total memory use increases in jumps (as shown by Ganglia),
> and becomes uneven across the processors (as shown by "top").
> The problem happens if "openib" is among the MCA parameters,
> but doesn't happen if I remove "openib" from the MCA list and use
> only "sm,self".
> For N=35,000, when NB reaches 96 memory use is already above the
> physical limit
> (16GB), having increased from 12.5GB to over 17GB.
> For N=40,000 the problem happens even earlier, with NB=80.
> At this point memory swapping kicks in,
> and eventually the run dies with memory allocation errors:
> ================================================================================
> T/V N NB P Q Time Gflops
> --------------------------------------------------------------------------------
> WR01L2L4 35000 128 8 1 539.66 5.297e+01
> --------------------------------------------------------------------------------
> ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0043992
> ...... PASSED
> HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
> >>> [7,0] Memory allocation failed for A, x and b. Skip. <<<
> ...
> ***
> The code snippet that corresponds to HPL_pdest.c is this,
> although the leak is probably somewhere else:
> /*
> * Allocate dynamic memory
> */
> vptr = (void*)malloc( ( (size_t)(ALGO->align) +
> (size_t)(mat.ld+1) * (size_t)(mat.nq) ) *
> sizeof(double) );
> info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol;
> (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max,
> GRID->all_comm );
> if( info[0] != 0 )
> {
> if( ( myrow == 0 ) && ( mycol == 0 ) )
> HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest",
> "[%d,%d] %s", info[1], info[2],
> "Memory allocation failed for A, x and b. Skip." );
> (TEST->kskip)++;
> return;
> }
> ***
> I found this continued increase in memory use rather strange,
> and suggestive of a memory leak in one of the codes being used.
> Everything (OpenMPI, GotoBLAS, and HPL)
> was compiled using Gnu only (gcc, gfortran, g++).
> I haven't changed anything on the compiler's memory model,
> i.e., I haven't used or changed the "-mcmodel" flag of gcc
> (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.)
> No additional load is present on the node,
> other than the OS (Linux CentOS 5.2), HPL is running alone.
> The cluster has Infiniband.
> However, I am running on a single node.
> The surprising thing is that if I run on shared memory only
> (-mca btl sm,self) there is no memory problem,
> the memory use is stable at about 13.9GB,
> and the run completes.
> So, there is a way around to run on a single node.
> (Actually shared memory is presumably the way to go on a single node.)
> However, if I introduce IB (-mca btl openib,sm,self)
> among the MCA btl parameters, then memory use blows up.
> This is bad news for me, because I want to extend the experiment
> to run HPL also across the whole cluster using IB,
> which is actually the ultimate goal of HPL, of course!
> It also suggests that the problem is somehow related to Infiniband,
> maybe hidden under OpenMPI.
> Here is the mpiexec command I use (with and without openib):
> /path/to/openmpi/bin/mpiexec \
> -prefix /the/run/directory \
> -np 8 \
> -mca btl [openib,]sm,self \
> xhpl
> Any help, insights, suggestions, reports of previous experiences,
> are much appreciated.
> Thank you,
> Gus Correa
