Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Thread binding problem
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2012-09-07 03:43:20


Hi,

Good, you found the kernel limit that exceed.

proc/memfinfo reports as MemFree 47834588 kB

numactl -H:

available: 2 nodes (0-1)
node 0 size: 24194 MB
node 0 free: 22702 MB
node 1 size: 24240 MB
node 1 free: 23997 MB
node distances:
node 0 1
  0: 10 21
  1: 21 10

Are you able to reproduce the error using my attached code?

Another question. I'm trying the same code in another system, but hwloc
gives: "Function not implemented".

Maybe because there isn't installed numa-devel package? Numa non devel
package il alreay installed.

Thanks.

2012/9/6 Brice Goglin <Brice.Goglin_at_[hidden]>

> Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
>
> Hi Brice,
>
> the initial grep is:
>
> numa_policy 65671 65952 24 144 1 : tunables 120 60
> 8 : slabdata 458 458 0
>
> When set_membind fails is:
>
> numa_policy 482 1152 24 144 1 : tunables 120 60
> 8 : slabdata 8 8 288
>
> What does it means?
>
>
> The first number is the number of active objects. That means 65000
> mempolicy objects were in use on the first line.
> (I wonder if you swapped the lines, I expected higher numbers at the end
> of the run)
>
> Anyway, having 65000 mempolicies in use is a lot. And that would somehow
> correspond to the number of set_area_membind that succeeed before one
> fails. So the kernel might indeed fail to merge those.
>
> That said, these objects are small (24bytes here if I am reading things
> correctly), so we're talking about 1,6MB only here. So there's still
> something else eating all the memory. /proc/meminfo (MemFree) and numactl
> -H should again help.
>
>
> Brice
>
>
>
>
>
> 2012/9/6 Brice Goglin <Brice.Goglin_at_[hidden]>
>
>> Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
>>
>> I did't find any strange number in /proc/meminfo.
>>
>> I've noted that the program fails exactly
>> every 65479 hwloc_set_area_membind. So It sounds like some kernel limit.
>> You can check that also just one thread.
>>
>> Maybe never has not noted them because usually we bind a large amount
>> of contiguos memory few times, instead of small and non contiguos pieces of
>> memory many and many times.. :(
>>
>>
>> If you have root access, try (as root)
>> watch -n 1 grep numa_policy /proc/slabinfo
>> Put a sleep(10) in your program when set_area_membind() fails, and don't
>> let your program exit before you can read the content of /proc/slabinfo.
>>
>> Brice
>>
>>
>>
>>
>>
>> 2012/9/6 Brice Goglin <Brice.Goglin_at_[hidden]>
>>
>>> Le 06/09/2012 10:44, Samuel Thibault a écrit :
>>> > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
>>> >> mbind hwloc_linux_set_area_membind() fails:
>>> >>
>>> >> Error from HWLOC mbind: Cannot allocate memory
>>> > Ok. mbind is not really supposed to allocate much memory, but it still
>>> > does allocate some, to record the policy
>>> >
>>> >> // hwloc_obj_t obj = hwloc_get_obj_by_type(topology,
>>> HWLOC_OBJ_NODE, tid);
>>> >> hwloc_obj_t obj = hwloc_get_obj_by_type(topology,
>>> HWLOC_OBJ_PU, tid);
>>> >> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset);
>>> >> hwloc_bitmap_singlify(cpuset);
>>> >> hwloc_set_cpubind(topology, cpuset, HWLOC_CPUBIND_THREAD);
>>> >>
>>> >> for( i = chunk*tid; i < len; i+=PAGE_SIZE) {
>>> >> // res = hwloc_set_area_membind_nodeset(topology,
>>> &array[i], PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND,
>>> HWLOC_MEMBIND_THREAD);
>>> >> res = hwloc_set_area_membind(topology, &array[i],
>>> PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD);
>>> > and I'm afraid that calling set_area_membind for each page might be too
>>> > dense: the kernel is probably allocating a memory policy record for
>>> each
>>> > page, not being able to merge adjacent equal policies.
>>> >
>>>
>>> It's supposed to merge VMA with same policies (from what I understand in
>>> the code), but I don't know if that actually works.
>>> Maybe Gabriele found a kernel bug :)
>>>
>>> Brice
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722<%2B39%20051%206171722>
>>
>> g.fatigati [AT] cineca.it
>>
>>
>> _______________________________________________
>> hwloc-users mailing listhwloc-users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>>
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
>
> _______________________________________________
> hwloc-users mailing listhwloc-users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>

-- 
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it