devel-bounces@open-mpi.org wrote on 08/29/2011 06:59:49
PM:
> De : Brice Goglin <Brice.Goglin@inria.fr>
> A : Open MPI Developers <devel@open-mpi.org>
> Date : 08/29/2011 07:00 PM
> Objet : Re: [OMPI devel] known limitation or
bug in hwloc?
> Envoyé par : devel-bounces@open-mpi.org
>
> I am playing with those aspects right now (it's planned for hwloc
v1.4).
> hwloc (even the 1.2 currently in OMPI) can already support topology
> containing different machines,
I guess this is what corresponds to the HWLOC_OBJ_SYSTEM
topology object?
> but there's no easy/automatic way to
> agregate multiple machine topologies into a single global one. The
> important thing to understand is that the cpuset/bitmap structure
does
> not span to multiple machines, it remains local (because it's tightly
> coupled to binding processes/memory). So if a process running on A
> considers a topology containing nodes A and B, only the cpusets of
> objects corresponding to A are meaningful. Trying (on A) to bind on
> cpusets from B objects would actually bind on A (if the core numbers
are
> similar). And the objects "above" the machine just have
no cpusets at
> all (because there's no way to bind across multiple machines).
>
> That said, my understanding is that this is not what this discussion
is
> about. Doesn't OMPI use one topology for each node so far? Nadia might
> just be playing with large node (more than 64 cores?) which cause
the
> bit loop to end too early.
Exactly: Bull guys are doing some tests on Westmere-EX
nodes: 4 sockets of 10 cores each, with potentially HT enabled.
The problem is that the BIOS has numbered the cores
in the following way (each pair x,y corresponds to the ids of a physical
core):
socket 0: 0,32 4,36 8,40 12,44 16,48 20,52 24,56
28,60 64,72 68,76
socket 0: 1,33 5,37 9,41 13,45 17,49 21,53 25,57
29,61 65,73 69,77
socket 2: 2,34 6,38 10,42 14,46 18,50 22,54 26,58
30,62 66,74 70,78
socket 3: 3,35 7,39 11,43 15,47 19,51 23,55 27,59
31,63 67,75 71,79
I hit the issue with a rankfile as soon as I reached
the following line:
rank 8=my_host slot=p64
Regards,
Nadia
>
> Brice
>
>
>
>
> Le 29/08/2011 18:47, Kenneth Lloyd a écrit :
> > This might get interesting. In "portable hardware
locality" (hwloc) as
> > originating at the native cpuset, and I see "locality"
working at the
> > machine level (machines in my world can have up to 8 CPUs, for
example).
> >
> > But from an ompi world view, the execution graph across myriad
machines
> > might dictate a larger, yet still fine grained approach. I
haven't had a
> > chance to play with those aspects. Has anyone else?
> >
> > Ken
> >
> >
> > -----Original Message-----
> > From: devel-bounces@open-mpi.org [mailto:devel-bounces@open-mpi.org]
On
> > Behalf Of Ralph Castain
> > Sent: Monday, August 29, 2011 8:21 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] known limitation or bug in hwloc?
> >
> > Actually, I'll eat those words. I was looking at the wrong place.
> >
> > Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for
those cases
> > where the bit mask extends over multiple words.
> >
> >
> > On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
> >
> >> Actually, if you look closely at the definition of those
two values,
> > you'll see that it really doesn't matter which one we loop over.
The
> > NUM_BITS value defines the actual total number of bits in the
mask. The
> > CPU_MAX is the total number of cpus we can support, which was
set to a value
> > such that the two are equal (i.e., it's a power of two that happens
to be an
> > integer multiple of 64).
> >> I believe the original intent was to allow CPU_MAX to be
independent of
> > address-alignment questions, so NUM_BITS could technically be
greater than
> > CPU_MAX. Even if this happens, though, all that would do is cause
the loop
> > to run across more bits than required.
> >> So it doesn't introduce a limitation at all. In hindsight,
we could
> > simplify things by eliminating one of those values and just putting
a
> > requirement on the number that it be a multiple of 64 so it aligns
with a
> > memory address.
> >>
> >> On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
> >>
> >>> Nadia,
> >>>
> >>> Interesting. I haven't tried pushing this to levels above
8 on a
> > particular
> >>> machine. Do you think that the cpuset / paffinity / hwloc
only applies at
> >>> the machine level, at which time you need to employ a
graph with carto?
> >>>
> >>> Regards,
> >>>
> >>> Ken
> >>>
> >>> -----Original Message-----
> >>> From: devel-bounces@open-mpi.org [mailto:devel-bounces@open-mpi.org]
On
> >>> Behalf Of nadia.derbey
> >>> Sent: Monday, August 29, 2011 5:45 AM
> >>> To: Open MPI Developers
> >>> Subject: [OMPI devel] known limitation or bug in hwloc?
> >>>
> >>> Hi list,
> >>>
> >>> I'm hitting a limitation with paffinity/hwloc with cpu
numbers >= 64.
> >>>
> >>> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c,
module_set() is
> >>> the routine that sets the calling process affinity to
the mask given as
> >>> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t
(so we
> >>> allow the cpus to be potentially numbered up to
> >>> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
> >>>
> >>> The problem with module_set() is that is loops over
> >>> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these
bits are set in
> >>> the mask:
> >>>
> >>> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS;
++i)
> >>> {
> >>> if (OPAL_PAFFINITY_CPU_ISSET(i,
mask)) {
> >>> hwloc_bitmap_set(set,
i);
> >>> }
> >>> }
> >>>
> >>> Given "mask"'s type, I think module_set() should
instead loop over
> >>> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
> >>>
> >>> Note that module_set() uses a type for its internal mask
that is
> >>> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
> >>>
> >>> So I'm wondering whether this is a known limitation I've
never heard of
> >>> or an actual bug?
> >>>
> >>> Regards,
> >>> Nadia
> >>>
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> -----
> >>> No virus found in this message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 10.0.1392 / Virus Database: 1520/3864 - Release
Date: 08/28/11
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1392 / Virus Database: 1520/3865 - Release Date:
08/29/11
> >
> > _______________________________________________
> > devel mailing list
> > devel@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel