Le 06/09/2012 10:13, Gabriele Fatigati a écrit :
Downsizing the array, up to 4GB, 

valgrind gives many warnings reported in the attached file.

Adding hwloc_topology_destroy() at the end of the file would likely remove most of them.

But that won't fix the problem since the leaks are small.
==28082== LEAK SUMMARY:
==28082==    definitely lost: 4,080 bytes in 3 blocks
==28082==    indirectly lost: 51,708 bytes in 973 blocks
==28082==      possibly lost: 304 bytes in 1 blocks
==28082==    still reachable: 1,786 bytes in 4 blocks
==28082==         suppressed: 0 bytes in 0 blocks

I don't know where to look, sorry.

Brice












2012/9/6 Gabriele Fatigati <g.fatigati@cineca.it>
Sorry,

I used a wrong hwloc installation. Using the hwloc with the printf controls:

mbind hwloc_linux_set_area_membind()  fails:

Error from HWLOC mbind: Cannot allocate memory 

so this is the origin of bad allocation.

I attach the right valgrind output

valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full --tool=memcheck  --show-reachable=yes ./main_hybrid_bind_mem





2012/9/6 Gabriele Fatigati <g.fatigati@cineca.it>
Hi Brice, hi Jeff,

>Can you add some printf inside hwloc_linux_set_area_membind() in src/topology-linux.c to see if ENOMEM comes from the mbind >syscall or not?

I added printf inside that function, but ENOMEM does not come from there.

>Have you run your application through valgrind or another memory-checking debugger?

I tried with valgrind :

valgrind --track-origins=yes --log-file=output_valgrind --leak-check=full --tool=memcheck  --show-reachable=yes ./main_hybrid_bind_mem

==25687== Warning: set address range perms: large range [0x39454040, 0x2218d4040) (undefined)
==25687== 
==25687==     Valgrind's memory management: out of memory:
==25687==        newSuperblock's request for 4194304 bytes failed.
==25687==        34253180928 bytes have already been allocated.
==25687==     Valgrind cannot continue.  Sorry.


I attach the full output. 


The code dies also using OpenMP pure code. Very misteriously.



2012/9/5 Jeff Squyres <jsquyres@cisco.com>
On Sep 5, 2012, at 2:36 PM, Gabriele Fatigati wrote:

> I don't think is a simply out of memory since NUMA node has 48 GB, and I'm allocating just 8 GB.

Mmm.  Probably right.

Have you run your application through valgrind or another memory-checking debugger?

I've seen cases of heap corruption lead to malloc incorrectly failing with ENOMEM.

--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users



--
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it                    Tel:   +39 051 6171722

g.fatigati [AT] cineca.it          



--
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it                    Tel:   +39 051 6171722

g.fatigati [AT] cineca.it          



--
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it                    Tel:   +39 051 6171722

g.fatigati [AT] cineca.it          


_______________________________________________
hwloc-users mailing list
hwloc-users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users