Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-users] Thread binding problem
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2012-09-07 03:43:20


Hi,

Good, you found the kernel limit that exceed.

proc/memfinfo reports as MemFree 47834588 kB

numactl -H:

available: 2 nodes (0-1)
node 0 size: 24194 MB
node 0 free: 22702 MB
node 1 size: 24240 MB
node 1 free: 23997 MB
node distances:
node 0 1
  0: 10 21
  1: 21 10

Are you able to reproduce the error using my attached code?

Another question. I'm trying the same code in another system, but hwloc
gives: "Function not implemented".

Maybe because there isn't installed numa-devel package? Numa non devel
package il alreay installed.

Thanks.

2012/9/6 Brice Goglin <Brice.Goglin_at_[hidden]>

> Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
>
> Hi Brice,
>
> the initial grep is:
>
> numa_policy 65671 65952 24 144 1 : tunables 120 60
> 8 : slabdata 458 458 0
>
> When set_membind fails is:
>
> numa_policy 482 1152 24 144 1 : tunables 120 60
> 8 : slabdata 8 8 288
>
> What does it means?
>
>
> The first number is the number of active objects. That means 65000
> mempolicy objects were in use on the first line.
> (I wonder if you swapped the lines, I expected higher numbers at the end
> of the run)
>
> Anyway, having 65000 mempolicies in use is a lot. And that would somehow
> correspond to the number of set_area_membind that succeeed before one
> fails. So the kernel might indeed fail to merge those.
>
> That said, these objects are small (24bytes here if I am reading things
> correctly), so we're talking about 1,6MB only here. So there's still
> something else eating all the memory. /proc/meminfo (MemFree) and numactl
> -H should again help.
>
>
> Brice
>
>
>
>
>
> 2012/9/6 Brice Goglin <Brice.Goglin_at_[hidden]>
>
>> Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
>>
>> I did't find any strange number in /proc/meminfo.
>>
>> I've noted that the program fails exactly
>> every 65479 hwloc_set_area_membind. So It sounds like some kernel limit.
>> You can check that also just one thread.
>>
>> Maybe never has not noted them because usually we bind a large amount
>> of contiguos memory few times, instead of small and non contiguos pieces of
>> memory many and many times.. :(
>>
>>
>> If you have root access, try (as root)
>> watch -n 1 grep numa_policy /proc/slabinfo
>> Put a sleep(10) in your program when set_area_membind() fails, and don't
>> let your program exit before you can read the content of /proc/slabinfo.
>>
>> Brice
>>
>>
>>
>>
>>
>> 2012/9/6 Brice Goglin <Brice.Goglin_at_[hidden]>
>>
>>> Le 06/09/2012 10:44, Samuel Thibault a écrit :
>>> > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
>>> >> mbind hwloc_linux_set_area_membind() fails:
>>> >>
>>> >> Error from HWLOC mbind: Cannot allocate memory
>>> > Ok. mbind is not really supposed to allocate much memory, but it still
>>> > does allocate some, to record the policy
>>> >
>>> >> // hwloc_obj_t obj = hwloc_get_obj_by_type(topology,
>>> HWLOC_OBJ_NODE, tid);
>>> >> hwloc_obj_t obj = hwloc_get_obj_by_type(topology,
>>> HWLOC_OBJ_PU, tid);
>>> >> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset);
>>> >> hwloc_bitmap_singlify(cpuset);
>>> >> hwloc_set_cpubind(topology, cpuset, HWLOC_CPUBIND_THREAD);
>>> >>
>>> >> for( i = chunk*tid; i < len; i+=PAGE_SIZE) {
>>> >> // res = hwloc_set_area_membind_nodeset(topology,
>>> &array[i], PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND,
>>> HWLOC_MEMBIND_THREAD);
>>> >> res = hwloc_set_area_membind(topology, &array[i],
>>> PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD);
>>> > and I'm afraid that calling set_area_membind for each page might be too
>>> > dense: the kernel is probably allocating a memory policy record for
>>> each
>>> > page, not being able to merge adjacent equal policies.
>>> >
>>>
>>> It's supposed to merge VMA with same policies (from what I understand in
>>> the code), but I don't know if that actually works.
>>> Maybe Gabriele found a kernel bug :)
>>>
>>> Brice
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722<%2B39%20051%206171722>
>>
>> g.fatigati [AT] cineca.it
>>
>>
>> _______________________________________________
>> hwloc-users mailing listhwloc-users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>>
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
>
> _______________________________________________
> hwloc-users mailing listhwloc-users_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>

-- 
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it