Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-users] Thread binding problem
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2012-09-06 04:13:04


Downsizing the array, up to 4GB,

valgrind gives many warnings reported in the attached file.

2012/9/6 Gabriele Fatigati <g.fatigati_at_[hidden]>

> Sorry,
>
> I used a wrong hwloc installation. Using the hwloc with the printf
> controls:
>
> mbind hwloc_linux_set_area_membind() fails:
>
> Error from HWLOC mbind: Cannot allocate memory
>
> so this is the origin of bad allocation.
>
> I attach the right valgrind output
>
> valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full
> --tool=memcheck --show-reachable=yes ./main_hybrid_bind_mem
>
>
>
>
>
> 2012/9/6 Gabriele Fatigati <g.fatigati_at_[hidden]>
>
>> Hi Brice, hi Jeff,
>>
>> >Can you add some printf inside hwloc_linux_set_area_membind() in
>> src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not?
>>
>> I added printf inside that function, but ENOMEM does not come from there.
>>
>> >Have you run your application through valgrind or another
>> memory-checking debugger?
>>
>> I tried with valgrind :
>>
>> valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full
>> --tool=memcheck --show-reachable=yes ./main_hybrid_bind_mem
>>
>> ==25687== Warning: set address range perms: large range [0x39454040,
>> 0x2218d4040) (undefined)
>> ==25687==
>> ==25687== Valgrind's memory management: out of memory:
>> ==25687== newSuperblock's request for 4194304 bytes failed.
>> ==25687== 34253180928 bytes have already been allocated.
>> ==25687== Valgrind cannot continue. Sorry.
>>
>>
>> I attach the full output.
>>
>>
>> The code dies also using OpenMP pure code. Very misteriously.
>>
>>
>>
>> 2012/9/5 Jeff Squyres <jsquyres_at_[hidden]>
>>
>>> On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:
>>>
>>> > I don't think is a simply out of memory since NUMA node has 48 GB, and
>>> I'm allocating just 8 GB.
>>>
>>> Mmm. Probably right.
>>>
>>> Have you run your application through valgrind or another
>>> memory-checking debugger?
>>>
>>> I've seen cases of heap corruption lead to malloc incorrectly failing
>>> with ENOMEM.
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.it Tel: +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> HPC specialist
>
> SuperComputing Applications and Innovation Department
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
>

-- 
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it