Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Thread binding problem
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2012-09-05 14:36:37


Dear Jeff,

I don't think is a simply out of memory since NUMA node has 48 GB, and I'm
allocating just 8 GB.

2012/9/5 Jeff Squyres <jsquyres_at_[hidden]>

> Perhaps you simply have run out of memory on that NUMA node, and therefore
> the malloc failed. Check "numactl --hardware", for example.
>
> You might want to check the output of numastat to see if one or more of
> your NUMA nodes have run out of memory.
>
>
> On Sep 5, 2012, at 12:58 PM, Gabriele Fatigati wrote:
>
> > I've reproduced the problem in a small MPI + OpenMP code.
> >
> > The error is the same: after some memory bind, gives "Cannot allocate
> memory".
> >
> > Thanks.
> >
> > 2012/9/5 Gabriele Fatigati <g.fatigati_at_[hidden]>
> > Downscaling the matrix size, binding works well, but the memory
> available is enought also using more big matrix, so I'm a bit confused.
> >
> > Using the same big matrix size without binding the code works well, so
> how I can explain this behaviour?
> >
> > Maybe hwloc_set_area_membind_nodeset introduces other extra allocation
> that are resilient after the call?
> >
> >
> >
> > 2012/9/5 Brice Goglin <Brice.Goglin_at_[hidden]>
> > An internal malloc failed then. That would explain why your malloc
> failed too.
> > It looks like you malloc'ed too much memory in your program?
> >
> > Brice
> >
> >
> >
> >
> > Le 05/09/2012 15:56, Gabriele Fatigati a écrit :
> >> An update:
> >>
> >> placing strerror(errno) after hwloc_set_area_membind_nodeset gives:
> "Cannot allocate memory"
> >>
> >> 2012/9/5 Gabriele Fatigati <g.fatigati_at_[hidden]>
> >> Hi,
> >>
> >> I've noted that hwloc_set_area_membind_nodeset return -1 but errno is
> not equal to EXDEV or ENOSYS. I supposed that these two case was the two
> unique possibly.
> >>
> >> From the hwloc documentation:
> >>
> >> -1 with errno set to ENOSYS if the action is not supported
> >> -1 with errno set to EXDEV if the binding cannot be enforced
> >>
> >>
> >> Any other binding failure reason? The memory available is enought.
> >>
> >> 2012/9/5 Brice Goglin <Brice.Goglin_at_[hidden]>
> >> Hello Gabriele,
> >>
> >> The only limit that I would think of is the available physical memory
> on each NUMA node (numactl -H will tell you how much of each NUMA node
> memory is still available).
> >> malloc usually only fails (it returns NULL?) when there no *virtual*
> memory anymore, that's different. If you don't allocate tons of terabytes
> of virtual memory, this shouldn't happen easily.
> >>
> >> Brice
> >>
> >>
> >>
> >>
> >> Le 05/09/2012 14:27, Gabriele Fatigati a écrit :
> >>> Dear Hwloc users and developers,
> >>>
> >>>
> >>> I'm using hwloc 1.4.1 on a multithreaded program in a Linux platform,
> where each thread bind many non contiguos pieces of a big matrix using in a
> very intensive way hwloc_set_area_membind_nodeset function:
> >>>
> >>> hwloc_set_area_membind_nodeset(topology, punt+offset, len, nodeset,
> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD | HWLOC_MEMBIND_MIGRATE);
> >>>
> >>> Binding seems works well, since the returned code from function is 0
> for every calls.
> >>>
> >>> The problems is that after binding, a simple little new malloc fails,
> without any apparent reason.
> >>>
> >>> Disabling memory binding, the allocations works well. Is there any
> knows problem if hwloc_set_area_membind_nodeset is used intensively?
> >>>
> >>> Is there some operating system limit for memory pages binding?
> >>>
> >>> Thanks in advance.
> >>>
> >>> --
> >>> Ing. Gabriele Fatigati
> >>>
> >>> HPC specialist
> >>>
> >>> SuperComputing Applications and Innovation Department
> >>>
> >>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >>>
> >>> www.cineca.it Tel: +39 051 6171722
> >>>
> >>> g.fatigati [AT] cineca.it
> >>>
> >>>
> >>> _______________________________________________
> >>> hwloc-users mailing list
> >>>
> >>> hwloc-users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> >>
> >>
> >>
> >>
> >> --
> >> Ing. Gabriele Fatigati
> >>
> >> HPC specialist
> >>
> >> SuperComputing Applications and Innovation Department
> >>
> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >>
> >> www.cineca.it Tel: +39 051 6171722
> >>
> >> g.fatigati [AT] cineca.it
> >>
> >>
> >>
> >> --
> >> Ing. Gabriele Fatigati
> >>
> >> HPC specialist
> >>
> >> SuperComputing Applications and Innovation Department
> >>
> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >>
> >> www.cineca.it Tel: +39 051 6171722
> >>
> >> g.fatigati [AT] cineca.it
> >
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > HPC specialist
> >
> > SuperComputing Applications and Innovation Department
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.it Tel: +39 051 6171722
> >
> > g.fatigati [AT] cineca.it
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > HPC specialist
> >
> > SuperComputing Applications and Innovation Department
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.it Tel: +39 051 6171722
> >
> > g.fatigati [AT] cineca.it
> > <main_hybrid_bind_mem.c>_______________________________________________
> > hwloc-users mailing list
> > hwloc-users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>

-- 
Ing. Gabriele Fatigati
HPC specialist
SuperComputing Applications and Innovation Department
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it                    Tel:   +39 051 6171722
g.fatigati [AT] cineca.it