Hi Brice,
the initial grep is:
numa_policy 65671 65952 24 144 1 : tunables 120 60 8 : slabdata 458 458 0
When set_membind fails is:
numa_policy 482 1152 24 144 1 : tunables 120 60 8 : slabdata 8 8 288
What does it means?
2012/9/6 Brice Goglin <Brice.Goglin@inria.fr>
Le 06/09/2012 12:19, Gabriele Fatigati a écrit :If you have root access, try (as root)I did't find any strange number in /proc/meminfo.
I've noted that the program fails exactly every 65479 hwloc_set_area_membind. So It sounds like some kernel limit. You can check that also just one thread.
Maybe never has not noted them because usually we bind a large amount of contiguos memory few times, instead of small and non contiguos pieces of memory many and many times.. :(
watch -n 1 grep numa_policy /proc/slabinfo
Put a sleep(10) in your program when set_area_membind() fails, and don't let your program exit before you can read the content of /proc/slabinfo.
Brice
2012/9/6 Brice Goglin <Brice.Goglin@inria.fr>
Le 06/09/2012 10:44, Samuel Thibault a écrit :
> Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :It's supposed to merge VMA with same policies (from what I understand in
>> mbind hwloc_linux_set_area_membind() fails:
>>
>> Error from HWLOC mbind: Cannot allocate memory
> Ok. mbind is not really supposed to allocate much memory, but it still
> does allocate some, to record the policy
>
>> // hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, tid);
>> hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, tid);
>> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset);
>> hwloc_bitmap_singlify(cpuset);
>> hwloc_set_cpubind(topology, cpuset, HWLOC_CPUBIND_THREAD);
>>
>> for( i = chunk*tid; i < len; i+=PAGE_SIZE) {
>> // res = hwloc_set_area_membind_nodeset(topology, &array[i], PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD);
>> res = hwloc_set_area_membind(topology, &array[i], PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD);
> and I'm afraid that calling set_area_membind for each page might be too
> dense: the kernel is probably allocating a memory policy record for each
> page, not being able to merge adjacent equal policies.
>
the code), but I don't know if that actually works.
Maybe Gabriele found a kernel bug :)
Brice
_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatigati [AT] cineca.it
_______________________________________________ hwloc-users mailing list hwloc-users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
--
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatigati [AT] cineca.it
_______________________________________________ hwloc-users mailing list hwloc-users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users