Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] HPL with OpenMPI: Do I have a memory leak?
From: Gus Correa (gus_at_[hidden])
Date: 2009-05-01 21:30:21

Hi Brian

Thank you very much for the instant help!

I just tried "-mca btl openib,sm,self" and
"-mca mpi_leave_pinned 0" together (still with OpenMPI 1.3.1).

So far so good, it passed through two NB cases/linear system solutions,
it is running the third NB, and the memory use hasn't increased.
On the failed runs the second NB already used more memory than the
first, and the third would blow up memory use.

If the run was bound do fail it would be swapping memory at this point,
and it is not.
This is a good sign, I hope I am not speaking too early,
but it looks like your suggestion fixed the problem.

It was interesting to observe using Ganglia
that on the failed runs the memory use "jumps"
happened whenever HPL switched from one NB to another.
Every NB transition (i.e., time HPL started to solve a
new linear system, and probably generated a new random matrix)
the memory use would jump to a (significantly) higher value.
Anyway, this is just is in case the info tells you something about what
might be going on.

I will certainly follow your advice and upgrade to OpenMPI 1.3.2,
which I just downloaded.
You guys are prolific, a new edition per month! :)

Many thanks!
Gus Correa

Brian W. Barrett wrote:
> Gus -
> Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc
> malloc implementation to handle memory registration caching for
> InfiniBand. Unfortunately, it was not only bugging in that it didn't
> work, but it also has the side effect that certain memory usage patterns
> can cause the memory allocator to use much more memory than it normally
> would. The configuration options were set any time the openib module
> was loaded, even if it wasn't used in communication. Can you try
> running with the extra option:
> -mca mpi_leave_pinned 0
> I'm guessing that will fix the problem. If you're using InfiniBand, you
> probably want to upgrade to 1.3.2, as there are known data corruption
> issues in 1.3.0 and 1.3.1 with openib.
> Brian
> On Fri, 1 May 2009, Gus Correa wrote:
>> Hi Ralph
>> Thank you very much for the prompt answer.
>> Sorry for being so confusing on my original message.
>> Yes, I am saying that the inclusion of openib is causing the difference
>> in behavior.
>> It runs with "sm,self", it fails with "openib,sm,self".
>> I am as puzzled as you are, because I thought the "openib" parameter
>> was simply ignored when running on a single node, exactly like you said.
>> After your message arrived, I ran HPL once more with "openib",
>> just in case.
>> Sure enough it failed just as I described.
>> And yes, all the procs run on a single node in both cases.
>> It doesn't seem to be a problem caused by a particular
>> node hardware either, as I already
>> tried three different nodes with similar results.
>> BTW, I successfully ran HPL across the whole cluster two days ago,
>> with IB ("openib,sm,self"),
>> but using a modest (for the cluster) problem size: N=50,000.
>> The total cluster memory is 24*16=384GB,
>> which gives a max HPL problem size N=195,000.
>> I have yet to try the large problem on the whole cluster,
>> but I am afraid I will stumble on the same memory problem.
>> Finally, on your email you use the syntax "btl=openib,sm,self",
>> with an "=" sign between the btl key and its values.
>> However, the mpiexec man page uses the syntax "btl openib,sm,self",
>> with a blank space between the btl key and its values.
>> I've been following the man page syntax.
>> The "=" sign doesn't seem to work, and aborts with the error:
>> "No executable was specified on the mpiexec command line.".
>> Could this possibly be the issue (say, wrong parsing of mca options)?
>> Many thanks!
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>> Ralph Castain wrote:
>>> If you are running on a single node, then btl=openib,sm,self would be
>>> equivalent to btl=sm,self. OMPI is smart enough to know not to use IB
>>> if you are on a single node, and instead uses the shared memory
>>> subsystem.
>>> Are you saying that the inclusion of openib is causing a difference
>>> in behavior, even though all procs are on the same node??
>>> Just want to ensure I understand the problem.
>>> Thanks
>>> Ralph
>>> On Fri, May 1, 2009 at 11:16 AM, Gus Correa <gus_at_[hidden]
>>> <mailto:gus_at_[hidden]>> wrote:
>>> Hi OpenMPI and HPC experts
>>> This may or may not be the right forum to post this,
>>> and I am sorry to bother those that think it is not.
>>> I am trying to run the HPL benchmark on our cluster,
>>> compiling it with Gnu and linking to
>>> GotoBLAS (1.26) and OpenMPI (1.3.1),
>>> both also Gnu-compiled.
>>> I have got failures that suggest a memory leak when the
>>> problem size is large, but still within the memory limits
>>> recommended by HPL.
>>> The problem only happens when "openib" is among the OpenMPI
>>> MCA parameters (and the problem size is large).
>>> Any help is appreciated.
>>> Here is a description of what happens.
>>> For starters I am trying HPL on a single node, to get a feeling for
>>> the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
>>> AMD Opteron 2376 "Shanghai"
>>> The HPL recommendation is to use close to 80% of your physical
>>> memory,
>>> to reach top Gigaflop performance.
>>> Our physical memory on a node is 16GB, and this gives a problem size
>>> N=40,000 to keep the 80% memory use.
>>> I tried several block sizes, somewhat correlated to the size of the
>>> processor cache: NB=64 80 96 128 ...
>>> When I run HPL with N=20,000 or smaller all works fine,
>>> and the HPL run completes, regardless of whether "openib"
>>> is present or not on my MCA parameters.
>>> However, moving when I move N=40,000, or even N=35,000,
>>> the run starts OK with NB=64,
>>> but as NB is switched to larger values
>>> the total memory use increases in jumps (as shown by Ganglia),
>>> and becomes uneven across the processors (as shown by "top").
>>> The problem happens if "openib" is among the MCA parameters,
>>> but doesn't happen if I remove "openib" from the MCA list and use
>>> only "sm,self".
>>> For N=35,000, when NB reaches 96 memory use is already above the
>>> physical limit
>>> (16GB), having increased from 12.5GB to over 17GB.
>>> For N=40,000 the problem happens even earlier, with NB=80.
>>> At this point memory swapping kicks in,
>>> and eventually the run dies with memory allocation errors:
>>> ================================================================================
>>> T/V N NB P Q Time Gflops
>>> --------------------------------------------------------------------------------
>>> WR01L2L4 35000 128 8 1 539.66 5.297e+01
>>> --------------------------------------------------------------------------------
>>> ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0043992
>>> ...... PASSED
>>> HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
>>> >>> [7,0] Memory allocation failed for A, x and b. Skip. <<<
>>> ...
>>> ***
>>> The code snippet that corresponds to HPL_pdest.c is this,
>>> although the leak is probably somewhere else:
>>> /*
>>> * Allocate dynamic memory
>>> */
>>> vptr = (void*)malloc( ( (size_t)(ALGO->align) +
>>> (size_t)(mat.ld+1) * (size_t)(mat.nq) ) *
>>> sizeof(double) );
>>> info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol;
>>> (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max,
>>> GRID->all_comm );
>>> if( info[0] != 0 )
>>> {
>>> if( ( myrow == 0 ) && ( mycol == 0 ) )
>>> HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest",
>>> "[%d,%d] %s", info[1], info[2],
>>> "Memory allocation failed for A, x and b.
>>> Skip." );
>>> (TEST->kskip)++;
>>> return;
>>> }
>>> ***
>>> I found this continued increase in memory use rather strange,
>>> and suggestive of a memory leak in one of the codes being used.
>>> Everything (OpenMPI, GotoBLAS, and HPL)
>>> was compiled using Gnu only (gcc, gfortran, g++).
>>> I haven't changed anything on the compiler's memory model,
>>> i.e., I haven't used or changed the "-mcmodel" flag of gcc
>>> (I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use
>>> it.)
>>> No additional load is present on the node,
>>> other than the OS (Linux CentOS 5.2), HPL is running alone.
>>> The cluster has Infiniband.
>>> However, I am running on a single node.
>>> The surprising thing is that if I run on shared memory only
>>> (-mca btl sm,self) there is no memory problem,
>>> the memory use is stable at about 13.9GB,
>>> and the run completes.
>>> So, there is a way around to run on a single node.
>>> (Actually shared memory is presumably the way to go on a single
>>> node.)
>>> However, if I introduce IB (-mca btl openib,sm,self)
>>> among the MCA btl parameters, then memory use blows up.
>>> This is bad news for me, because I want to extend the experiment
>>> to run HPL also across the whole cluster using IB,
>>> which is actually the ultimate goal of HPL, of course!
>>> It also suggests that the problem is somehow related to Infiniband,
>>> maybe hidden under OpenMPI.
>>> Here is the mpiexec command I use (with and without openib):
>>> /path/to/openmpi/bin/mpiexec \
>>> -prefix /the/run/directory \
>>> -np 8 \
>>> -mca btl [openib,]sm,self \
>>> xhpl
>>> Any help, insights, suggestions, reports of previous experiences,
>>> are much appreciated.
>>> Thank you,
>>> Gus Correa
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> ------------------------------------------------------------------------
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]