Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] known limitation or bug in hwloc?
From: nadia.derbey_at_[hidden]
Date: 2011-08-30 01:58:34


devel-bounces_at_[hidden] wrote on 08/29/2011 06:59:49 PM:

> De : Brice Goglin <Brice.Goglin_at_[hidden]>
> A : Open MPI Developers <devel_at_[hidden]>
> Date : 08/29/2011 07:00 PM
> Objet : Re: [OMPI devel] known limitation or bug in hwloc?
> Envoyé par : devel-bounces_at_[hidden]
>
> I am playing with those aspects right now (it's planned for hwloc v1.4).
> hwloc (even the 1.2 currently in OMPI) can already support topology
> containing different machines,

I guess this is what corresponds to the HWLOC_OBJ_SYSTEM topology object?

> but there's no easy/automatic way to
> agregate multiple machine topologies into a single global one. The
> important thing to understand is that the cpuset/bitmap structure does
> not span to multiple machines, it remains local (because it's tightly
> coupled to binding processes/memory). So if a process running on A
> considers a topology containing nodes A and B, only the cpusets of
> objects corresponding to A are meaningful. Trying (on A) to bind on
> cpusets from B objects would actually bind on A (if the core numbers are
> similar). And the objects "above" the machine just have no cpusets at
> all (because there's no way to bind across multiple machines).
>
> That said, my understanding is that this is not what this discussion is
> about. Doesn't OMPI use one topology for each node so far? Nadia might
> just be playing with large node (more than 64 cores?) which cause the
> bit loop to end too early.

Exactly: Bull guys are doing some tests on Westmere-EX nodes: 4 sockets of
10 cores each, with potentially HT enabled.
The problem is that the BIOS has numbered the cores in the following way
(each pair x,y corresponds to the ids of a physical core):

socket 0: 0,32 4,36 8,40 12,44 16,48 20,52 24,56 28,60 64,72 68,76
socket 0: 1,33 5,37 9,41 13,45 17,49 21,53 25,57 29,61 65,73 69,77
socket 2: 2,34 6,38 10,42 14,46 18,50 22,54 26,58 30,62 66,74 70,78
socket 3: 3,35 7,39 11,43 15,47 19,51 23,55 27,59 31,63 67,75 71,79

I hit the issue with a rankfile as soon as I reached the following line:

rank 8=my_host slot=p64

Regards,
Nadia

>
> Brice
>
>
>
>
> Le 29/08/2011 18:47, Kenneth Lloyd a écrit :
> > This might get interesting. In "portable hardware locality" (hwloc)
as
> > originating at the native cpuset, and I see "locality" working at the
> > machine level (machines in my world can have up to 8 CPUs, for
example).
> >
> > But from an ompi world view, the execution graph across myriad
machines
> > might dictate a larger, yet still fine grained approach. I haven't
had a
> > chance to play with those aspects. Has anyone else?
> >
> > Ken
> >
> >
> > -----Original Message-----
> > From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> > Behalf Of Ralph Castain
> > Sent: Monday, August 29, 2011 8:21 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] known limitation or bug in hwloc?
> >
> > Actually, I'll eat those words. I was looking at the wrong place.
> >
> > Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those
cases
> > where the bit mask extends over multiple words.
> >
> >
> > On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
> >
> >> Actually, if you look closely at the definition of those two values,
> > you'll see that it really doesn't matter which one we loop over. The
> > NUM_BITS value defines the actual total number of bits in the mask.
The
> > CPU_MAX is the total number of cpus we can support, which was set to a
value
> > such that the two are equal (i.e., it's a power of two that happens to
be an
> > integer multiple of 64).
> >> I believe the original intent was to allow CPU_MAX to be independent
of
> > address-alignment questions, so NUM_BITS could technically be greater
than
> > CPU_MAX. Even if this happens, though, all that would do is cause the
loop
> > to run across more bits than required.
> >> So it doesn't introduce a limitation at all. In hindsight, we could
> > simplify things by eliminating one of those values and just putting a
> > requirement on the number that it be a multiple of 64 so it aligns
with a
> > memory address.
> >>
> >> On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
> >>
> >>> Nadia,
> >>>
> >>> Interesting. I haven't tried pushing this to levels above 8 on a
> > particular
> >>> machine. Do you think that the cpuset / paffinity / hwloc only
applies at
> >>> the machine level, at which time you need to employ a graph with
carto?
> >>>
> >>> Regards,
> >>>
> >>> Ken
> >>>
> >>> -----Original Message-----
> >>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> >>> Behalf Of nadia.derbey
> >>> Sent: Monday, August 29, 2011 5:45 AM
> >>> To: Open MPI Developers
> >>> Subject: [OMPI devel] known limitation or bug in hwloc?
> >>>
> >>> Hi list,
> >>>
> >>> I'm hitting a limitation with paffinity/hwloc with cpu numbers >=
64.
> >>>
> >>> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set()
is
> >>> the routine that sets the calling process affinity to the mask given
as
> >>> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so
we
> >>> allow the cpus to be potentially numbered up to
> >>> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
> >>>
> >>> The problem with module_set() is that is loops over
> >>> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are
set in
> >>> the mask:
> >>>
> >>> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS;
++i)
> >>> {
> >>> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
> >>> hwloc_bitmap_set(set, i);
> >>> }
> >>> }
> >>>
> >>> Given "mask"'s type, I think module_set() should instead loop over
> >>> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
> >>>
> >>> Note that module_set() uses a type for its internal mask that is
> >>> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
> >>>
> >>> So I'm wondering whether this is a known limitation I've never heard
of
> >>> or an actual bug?
> >>>
> >>> Regards,
> >>> Nadia
> >>>
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> -----
> >>> No virus found in this message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date:
08/28/11
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1392 / Virus Database: 1520/3865 - Release Date:
08/29/11
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel