Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] my leak or OpenMPI's leak?
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-10-18 08:48:35


On Oct 18, 2010, at 1:41 AM, jody wrote:

> I had this leak with OpenMPI 1.4.2
>
> But in my case, there is no accumulation - when i repeat the same call,
> no additional leak is reported for the second call

That's because it grabs a larger-than-required chunk of memory just in case you call again. This helps performance by reducing the number of malloc's in your application.

>
> Jody
>
> On Mon, Oct 18, 2010 at 1:57 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> There is no OMPI 2.5 - do you mean 1.5?
>>
>> On Oct 17, 2010, at 4:11 PM, Brian Budge wrote:
>>
>>> Hi Jody -
>>>
>>> I noticed this exact same thing the other day when I used OpenMPI v
>>> 2.5 built with valgrind support. I actually ran out of memory due to
>>> this. When I went back to v 2.43, my program worked fine.
>>>
>>> Are you also using 2.5?
>>>
>>> Brian
>>>
>>> On Wed, Oct 6, 2010 at 4:32 AM, jody <jody.xha_at_[hidden]> wrote:
>>>> Hi
>>>> I regularly use valgrind to check for leaks, but i ignore the leaks
>>>> clearly created by OpenMPI,
>>>> because i think most of them happen because of efficiency (lose no
>>>> time cleaning up unimportant leaks).
>>>> But i want to make sure no leaks come from my own apps.
>>>> In most of the cases, leaks i am responsible for have the name of one
>>>> of my files at the bottom of the stack printed by valgrind,
>>>> and no internal OpenMPI-calls above, whereas leaks clearly caused by
>>>> OpenMPI have something like
>>>> ompi_mpi_init, mca_pml_base_open, PMPI_Init etc at or very near the bottom.
>>>>
>>>> Now i have an application where i am completely unsure where the
>>>> responsibility for a particular leak lies. valgrind shows (among
>>>> others) this report
>>>>
>>>> ==2756== 9,704 (8,348 direct, 1,356 indirect) bytes in 1 blocks are
>>>> definitely lost in loss record 2,033 of 2,036
>>>> ==2756== at 0x4005943: malloc (vg_replace_malloc.c:195)
>>>> ==2756== by 0x4049387: ompi_free_list_grow (in
>>>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2)
>>>> ==2756== by 0x41CA613: ???
>>>> ==2756== by 0x41BDD91: ???
>>>> ==2756== by 0x41B0C3D: ???
>>>> ==2756== by 0x408AC9C: PMPI_Send (in
>>>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2)
>>>> ==2756== by 0x8123377: ConnectorBase::send(CollectionBase*,
>>>> std::pair<std::pair<unsigned short, unsigned short>,
>>>> std::pair<unsigned short, unsigned short> >&) (ConnectorBase.cpp:39)
>>>> ==2756== by 0x8123CEE: TileConnector::sendTile() (TileConnector.cpp:36)
>>>> ==2756== by 0x80C6839: TDMaster::init(int, char**) (TDMaster.cpp:226)
>>>> ==2756== by 0x80C167B: main (TDMain.cpp:24)
>>>> ==2756==
>>>>
>>>> At a first glimpse it looks like an OpenMPI-internal leak,
>>>> because it happens iinside PMPI_Send,
>>>> but then i am using the function ConnectorBase::send()
>>>> several times from other callers than TileConnector,
>>>> but these don't show up in valgrind's output.
>>>>
>>>> Does anybody have an idea what is happening here?
>>>>
>>>> Thank You
>>>> jody
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users