Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Thread binding problem
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-09-05 14:00:24


Perhaps you simply have run out of memory on that NUMA node, and therefore the malloc failed. Check "numactl --hardware", for example.

You might want to check the output of numastat to see if one or more of your NUMA nodes have run out of memory.

On Sep 5, 2012, at 12:58 PM, Gabriele Fatigati wrote:

> I've reproduced the problem in a small MPI + OpenMP code.
>
> The error is the same: after some memory bind, gives "Cannot allocate memory".
>
> Thanks.
>
> 2012/9/5 Gabriele Fatigati <g.fatigati_at_[hidden]>
> Downscaling the matrix size, binding works well, but the memory available is enought also using more big matrix, so I'm a bit confused.
>
> Using the same big matrix size without binding the code works well, so how I can explain this behaviour?
>
> Maybe hwloc_set_area_membind_nodeset introduces other extra allocation that are resilient after the call?
>
>
>
> 2012/9/5 Brice Goglin <Brice.Goglin_at_[hidden]>
> An internal malloc failed then. That would explain why your malloc failed too.
> It looks like you malloc'ed too much memory in your program?
>
> Brice
>
>
>
>
> Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
>> An update:
>>
>> placing strerror(errno) after hwloc_set_area_membind_nodeset gives: "Cannot allocate memory"
>>
>> 2012/9/5 Gabriele Fatigati <g.fatigati_at_[hidden]>
>> Hi,
>>
>> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is not equal to EXDEV or ENOSYS. I supposed that these two case was the two unique possibly.
>>
>> From the hwloc documentation:
>>
>> -1 with errno set to ENOSYS if the action is not supported
>> -1 with errno set to EXDEV if the binding cannot be enforced
>>
>>
>> Any other binding failure reason? The memory available is enought.
>>
>> 2012/9/5 Brice Goglin <Brice.Goglin_at_[hidden]>
>> Hello Gabriele,
>>
>> The only limit that I would think of is the available physical memory on each NUMA node (numactl -H will tell you how much of each NUMA node memory is still available).
>> malloc usually only fails (it returns NULL?) when there no *virtual* memory anymore, that's different. If you don't allocate tons of terabytes of virtual memory, this shouldn't happen easily.
>>
>> Brice
>>
>>
>>
>>
>> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
>>> Dear Hwloc users and developers,
>>>
>>>
>>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform, where each thread bind many non contiguos pieces of a big matrix using in a very intensive way hwloc_set_area_membind_nodeset function:
>>>
>>> hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
>>>
>>> Binding seems works well, since the returned code from function is 0 for every calls.
>>>
>>> The problems is that after binding, a simple little new malloc fails, without any apparent reason.
>>>
>>> Disabling memory binding, the allocations works well. Is there any knows problem if hwloc_set_area_membind_nodeset is used intensively?
>>>
>>> Is there some operating system limit for memory pages binding?
>>>
>>> Thanks in advance.
>>>
>>> --
>>> Ing. Gabriele Fatigati
>>>
>>> HPC specialist
>>>
>>> SuperComputing Applications and Innovation Department
>>>
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>
>>> www.cineca.it Tel: +39 051 6171722
>>>
>>> g.fatigati [AT] cineca.it
>>>
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>>
>>> hwloc-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
> <main_hybrid_bind_mem.c>_______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/