Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] my leak or OpenMPI's leak?
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-10-18 15:09:24


I would guess that the difference is because the master process is communicating to multiple slaves, which causes it to need more memory, so that eventually it has to go get more than the initial block we allocated at startup - but the slaves, talking only to one process, never exceed the initial block and therefore never get more.

Just guessing without digging into your specific code.

On Oct 18, 2010, at 10:01 AM, jody wrote:

> But shouldn't something like this show up in the other processes as well?
> I only see that in the master process, but the slave processes also
> send data to each other and to the master.
>
>
> On Mon, Oct 18, 2010 at 2:48 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>> On Oct 18, 2010, at 1:41 AM, jody wrote:
>>
>>> I had this leak with OpenMPI 1.4.2
>>>
>>> But in my case, there is no accumulation - when i repeat the same call,
>>> no additional leak is reported for the second call
>>
>> That's because it grabs a larger-than-required chunk of memory just in case you call again. This helps performance by reducing the number of malloc's in your application.
>>
>>
>>>
>>> Jody
>>>
>>> On Mon, Oct 18, 2010 at 1:57 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> There is no OMPI 2.5 - do you mean 1.5?
>>>>
>>>> On Oct 17, 2010, at 4:11 PM, Brian Budge wrote:
>>>>
>>>>> Hi Jody -
>>>>>
>>>>> I noticed this exact same thing the other day when I used OpenMPI v
>>>>> 2.5 built with valgrind support. I actually ran out of memory due to
>>>>> this. When I went back to v 2.43, my program worked fine.
>>>>>
>>>>> Are you also using 2.5?
>>>>>
>>>>> Brian
>>>>>
>>>>> On Wed, Oct 6, 2010 at 4:32 AM, jody <jody.xha_at_[hidden]> wrote:
>>>>>> Hi
>>>>>> I regularly use valgrind to check for leaks, but i ignore the leaks
>>>>>> clearly created by OpenMPI,
>>>>>> because i think most of them happen because of efficiency (lose no
>>>>>> time cleaning up unimportant leaks).
>>>>>> But i want to make sure no leaks come from my own apps.
>>>>>> In most of the cases, leaks i am responsible for have the name of one
>>>>>> of my files at the bottom of the stack printed by valgrind,
>>>>>> and no internal OpenMPI-calls above, whereas leaks clearly caused by
>>>>>> OpenMPI have something like
>>>>>> ompi_mpi_init, mca_pml_base_open, PMPI_Init etc at or very near the bottom.
>>>>>>
>>>>>> Now i have an application where i am completely unsure where the
>>>>>> responsibility for a particular leak lies. valgrind shows (among
>>>>>> others) this report
>>>>>>
>>>>>> ==2756== 9,704 (8,348 direct, 1,356 indirect) bytes in 1 blocks are
>>>>>> definitely lost in loss record 2,033 of 2,036
>>>>>> ==2756== at 0x4005943: malloc (vg_replace_malloc.c:195)
>>>>>> ==2756== by 0x4049387: ompi_free_list_grow (in
>>>>>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2)
>>>>>> ==2756== by 0x41CA613: ???
>>>>>> ==2756== by 0x41BDD91: ???
>>>>>> ==2756== by 0x41B0C3D: ???
>>>>>> ==2756== by 0x408AC9C: PMPI_Send (in
>>>>>> /opt/openmpi-1.4.2.p/lib/libmpi.so.0.0.2)
>>>>>> ==2756== by 0x8123377: ConnectorBase::send(CollectionBase*,
>>>>>> std::pair<std::pair<unsigned short, unsigned short>,
>>>>>> std::pair<unsigned short, unsigned short> >&) (ConnectorBase.cpp:39)
>>>>>> ==2756== by 0x8123CEE: TileConnector::sendTile() (TileConnector.cpp:36)
>>>>>> ==2756== by 0x80C6839: TDMaster::init(int, char**) (TDMaster.cpp:226)
>>>>>> ==2756== by 0x80C167B: main (TDMain.cpp:24)
>>>>>> ==2756==
>>>>>>
>>>>>> At a first glimpse it looks like an OpenMPI-internal leak,
>>>>>> because it happens iinside PMPI_Send,
>>>>>> but then i am using the function ConnectorBase::send()
>>>>>> several times from other callers than TileConnector,
>>>>>> but these don't show up in valgrind's output.
>>>>>>
>>>>>> Does anybody have an idea what is happening here?
>>>>>>
>>>>>> Thank You
>>>>>> jody
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users