Le 06/09/2012 14:51, Gabriele Fatigati a écrit :
Hi Brice,

the initial grep is:

numa_policy        65671  65952     24  144    1 : tunables  120   60    8 : slabdata    458    458      0

When set_membind fails is:

numa_policy          482   1152     24  144    1 : tunables  120   60    8 : slabdata      8      8    288

What does it means?

The first number is the number of active objects. That means 65000 mempolicy objects were in use on the first line.
(I wonder if you swapped the lines, I expected higher numbers at the end of the run)

Anyway, having 65000 mempolicies in use is a lot. And that would somehow correspond to the number of set_area_membind that succeeed before one fails. So the kernel might indeed fail to merge those.

That said, these objects are small (24bytes here if I am reading things correctly), so we're talking about 1,6MB only here. So there's still something else eating all the memory. /proc/meminfo (MemFree) and numactl -H should again help.

Brice





2012/9/6 Brice Goglin <Brice.Goglin@inria.fr>
Le 06/09/2012 12:19, Gabriele Fatigati a écrit :
I did't find any strange number in /proc/meminfo.

I've noted that the program fails exactly every 65479 hwloc_set_area_membind. So It sounds like some kernel limit. You can check that also just one thread.

Maybe never has not noted them  because usually we bind a large amount of contiguos memory few times, instead of small and non contiguos pieces of memory many and many times.. :(

If you have root access, try (as root)
    watch -n 1 grep numa_policy /proc/slabinfo
Put a sleep(10) in your program when set_area_membind() fails, and don't let your program exit before you can read the content of /proc/slabinfo.

Brice





2012/9/6 Brice Goglin <Brice.Goglin@inria.fr>
Le 06/09/2012 10:44, Samuel Thibault a écrit :
> Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit :
>> mbind hwloc_linux_set_area_membind()  fails:
>>
>> Error from HWLOC mbind: Cannot allocate memory
> Ok. mbind is not really supposed to allocate much memory, but it still
> does allocate some, to record the policy
>
>> //        hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, tid);
>>         hwloc_obj_t obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, tid);
>>         hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset);
>>         hwloc_bitmap_singlify(cpuset);
>>         hwloc_set_cpubind(topology, cpuset, HWLOC_CPUBIND_THREAD);
>>
>>         for( i = chunk*tid; i < len; i+=PAGE_SIZE) {
>> //           res = hwloc_set_area_membind_nodeset(topology, &array[i], PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD);
>>              res = hwloc_set_area_membind(topology, &array[i], PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD);
> and I'm afraid that calling set_area_membind for each page might be too
> dense: the kernel is probably allocating a memory policy record for each
> page, not being able to merge adjacent equal policies.
>

It's supposed to merge VMA with same policies (from what I understand in
the code), but I don't know if that actually works.
Maybe Gabriele found a kernel bug :)

Brice

_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



--
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it                    Tel:   +39 051 6171722

g.fatigati [AT] cineca.it          


_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users


_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



--
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it                    Tel:   +39 051 6171722

g.fatigati [AT] cineca.it          


_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users