Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] known limitation or bug in hwloc?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-08-29 12:59:49


I am playing with those aspects right now (it's planned for hwloc v1.4).
hwloc (even the 1.2 currently in OMPI) can already support topology
containing different machines, but there's no easy/automatic way to
agregate multiple machine topologies into a single global one. The
important thing to understand is that the cpuset/bitmap structure does
not span to multiple machines, it remains local (because it's tightly
coupled to binding processes/memory). So if a process running on A
considers a topology containing nodes A and B, only the cpusets of
objects corresponding to A are meaningful. Trying (on A) to bind on
cpusets from B objects would actually bind on A (if the core numbers are
similar). And the objects "above" the machine just have no cpusets at
all (because there's no way to bind across multiple machines).

That said, my understanding is that this is not what this discussion is
about. Doesn't OMPI use one topology for each node so far? Nadia might
just be playing with large node (more than 64 cores?) which cause the
bit loop to end too early.

Brice

Le 29/08/2011 18:47, Kenneth Lloyd a écrit :
> This might get interesting. In "portable hardware locality" (hwloc) as
> originating at the native cpuset, and I see "locality" working at the
> machine level (machines in my world can have up to 8 CPUs, for example).
>
> But from an ompi world view, the execution graph across myriad machines
> might dictate a larger, yet still fine grained approach. I haven't had a
> chance to play with those aspects. Has anyone else?
>
> Ken
>
>
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> Behalf Of Ralph Castain
> Sent: Monday, August 29, 2011 8:21 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] known limitation or bug in hwloc?
>
> Actually, I'll eat those words. I was looking at the wrong place.
>
> Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those cases
> where the bit mask extends over multiple words.
>
>
> On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
>
>> Actually, if you look closely at the definition of those two values,
> you'll see that it really doesn't matter which one we loop over. The
> NUM_BITS value defines the actual total number of bits in the mask. The
> CPU_MAX is the total number of cpus we can support, which was set to a value
> such that the two are equal (i.e., it's a power of two that happens to be an
> integer multiple of 64).
>> I believe the original intent was to allow CPU_MAX to be independent of
> address-alignment questions, so NUM_BITS could technically be greater than
> CPU_MAX. Even if this happens, though, all that would do is cause the loop
> to run across more bits than required.
>> So it doesn't introduce a limitation at all. In hindsight, we could
> simplify things by eliminating one of those values and just putting a
> requirement on the number that it be a multiple of 64 so it aligns with a
> memory address.
>>
>> On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
>>
>>> Nadia,
>>>
>>> Interesting. I haven't tried pushing this to levels above 8 on a
> particular
>>> machine. Do you think that the cpuset / paffinity / hwloc only applies at
>>> the machine level, at which time you need to employ a graph with carto?
>>>
>>> Regards,
>>>
>>> Ken
>>>
>>> -----Original Message-----
>>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
>>> Behalf Of nadia.derbey
>>> Sent: Monday, August 29, 2011 5:45 AM
>>> To: Open MPI Developers
>>> Subject: [OMPI devel] known limitation or bug in hwloc?
>>>
>>> Hi list,
>>>
>>> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64.
>>>
>>> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is
>>> the routine that sets the calling process affinity to the mask given as
>>> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we
>>> allow the cpus to be potentially numbered up to
>>> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
>>>
>>> The problem with module_set() is that is loops over
>>> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in
>>> the mask:
>>>
>>> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i)
>>> {
>>> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
>>> hwloc_bitmap_set(set, i);
>>> }
>>> }
>>>
>>> Given "mask"'s type, I think module_set() should instead loop over
>>> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
>>>
>>> Note that module_set() uses a type for its internal mask that is
>>> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
>>>
>>> So I'm wondering whether this is a known limitation I've never heard of
>>> or an actual bug?
>>>
>>> Regards,
>>> Nadia
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1392 / Virus Database: 1520/3865 - Release Date: 08/29/11
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel