Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list
From: Max Staufer (max.staufer_at_[hidden])
Date: 2013-09-13 07:06:23


Hi Rolf,

    I applied your patch, the full output is rather big, even gzip >
10Mb, which is not good for the mailinglist, but the head and tail are
below for a 7 and 8 processor run.
Seem that the send requests are growing fast 4000 times in just 10 min.

Do you now of a method to bound the list such that it is not growing
excessivly ?

thanks

Max

7 Processor run
------------------
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11236]
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11236]
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11236]
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11236]
[gpu207.dev-env.lan:11236] Iteration = 0 sleeping
[gpu207.dev-env.lan:11236] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11236] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0

......

[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11243]
[gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11243]
[gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11243]
[gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11243]
[gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11243]
[gpu207.dev-env.lan:11243] Iteration = 0 sleeping
[gpu207.dev-env.lan:11243] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=send_requests, numAlloc=16324,
maxAlloc=-1
[gpu207.dev-env.lan:11243] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11243] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0

8 Processor run
--------------------

[gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11315]
[gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11315]
[gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11315]
[gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11315]
[gpu207.dev-env.lan:11315] Iteration = 0 sleeping
[gpu207.dev-env.lan:11315] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=send_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] Freelist=recv_requests, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11315] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0

...

[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0
[gpu207.dev-env.lan:11322]
[gpu207.dev-env.lan:11322] Iteration = 0 sleeping
[gpu207.dev-env.lan:11322] Freelist=rdma_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_frags, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=pending_pckts, numAlloc=4, maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_ranges_pckts, numAlloc=4,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=send_requests, numAlloc=16708,
maxAlloc=-1
[gpu207.dev-env.lan:11322] Freelist=recv_requests, numAlloc=68, maxAlloc=-1
[gpu207.dev-env.lan:11322] rdma_pending=0, pckt_pending=0,
recv_pending=0, send_pending=0, comm_pending=0

Am 12.09.2013 17:04, schrieb Rolf vandeVaart:
> Can you apply this patch and try again? It will print out the sizes of the free lists after every 100 calls into the mca_pml_ob1_send. It would be interesting to see which one is growing.
> This might give us some clues.
>
> Rolf
>
>> -----Original Message-----
>> From: Max Staufer [mailto:max.staufer_at_[hidden]]
>> Sent: Thursday, September 12, 2013 3:53 AM
>> To: Rolf vandeVaart
>> Subject: Re: [OMPI devel] Nearly unlimited growth of pml free list
>>
>> Hi Rolf,
>>
>> the heap snapshots I do tell me where and when the memory has been
>> allocated, and a simple source trace of the in tells me that the calling
>>
>> routine was mca_pml_ob1_send and that all of the ~100000 single allocations
>> during the run were called because of an MPI_ALLREDUCE command called in
>> exactly one place of the code.
>> The tool I use for doing that is MemorySCAPE but I thing Valgrind can tell you
>> the same thing. However, I was not able to reproduce the problem in a
>> simpler program yet, but I suspect it has something to do with the locking
>> mechanism of the list elements. I dont know enough about OMPI to comment
>> on that, but it looks like that the list is growing because all elements are
>> locked.
>>
>> really any help is appreciated
>>
>> Max
>>
>> PS:
>>
>> IF I MIMICK ALLREDUCE with 2*Nproc SEND and RECV commands (aggregating
>> on proc 0 and then sending out to all Proc) I get the same kind of behaviour.
>>
>> Am 11.09.2013 17:12, schrieb Rolf vandeVaart:
>>> Hi Max:
>>> You say that that the function keeps "allocating memory in the pml free list."
>> How do you know that is happening?
>>> Do you know which free list it is happening on? There are something like 8
>> free lists associated with the pml ob1 so it would be interesting to know which
>> one you observe is growing.
>>> Rolf
>>>
>>>> -----Original Message-----
>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Max
>>>> Staufer
>>>> Sent: Wednesday, September 11, 2013 10:23 AM
>>>> To: devel_at_[hidden]
>>>> Subject: [OMPI devel] Nearly unlimited growth of pml free list
>>>>
>>>> Hi All,
>>>>
>>>> as I already asked in the users list, I was told thats not the
>>>> right place to ask, I came across a "missbehaviour" of openmpi version
>> 1.4.5 and 1.6.5 alike.
>>>> the mca_pml_ob1_send function keeps allocating memory in the pml free
>> list.
>>>> It does that indefinitly. In my case the list grew to about 100Gb.
>>>>
>>>> I can controll the maximum using the pml_ob1_free_list_max parameter,
>>>> but then the application just stops working when this number of
>>>> entries in the list is reached.
>>>>
>>>> The interesting part is that the growth only happens in a single
>>>> place in the code, which is RECURSIVE SUBROUTINE.
>>>>
>>>> And the called function is an MPI_ALLREDUCE(... MPI_SUM)
>>>>
>>>> Apparently its not easy to create a test program that shows the same
>>>> behaviour, just recursion is not enought.
>>>>
>>>> Is there a mca parameter that allows to limit the total list size
>>>> without making the app. stop ?
>>>>
>>>> or is there a way to enforce the lock on the free list entries ?
>>>>
>>>> Thanks for all the help
>>>>
>>>> Max
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> ----------------------------------------------------------------------
>>> ------------- This email message is for the sole use of the intended
>>> recipient(s) and may contain confidential information. Any
>>> unauthorized review, use, disclosure or distribution is prohibited.
>>> If you are not the intended recipient, please contact the sender by
>>> reply email and destroy all copies of the original message.
>>> ----------------------------------------------------------------------
>>> -------------