Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] known limitation or bug in hwloc?
From: Kenneth Lloyd (kenneth.lloyd_at_[hidden])
Date: 2011-08-29 12:47:35


This might get interesting. In "portable hardware locality" (hwloc) as
originating at the native cpuset, and I see "locality" working at the
machine level (machines in my world can have up to 8 CPUs, for example).

But from an ompi world view, the execution graph across myriad machines
might dictate a larger, yet still fine grained approach. I haven't had a
chance to play with those aspects. Has anyone else?

Ken

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Ralph Castain
Sent: Monday, August 29, 2011 8:21 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] known limitation or bug in hwloc?

Actually, I'll eat those words. I was looking at the wrong place.

Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those cases
where the bit mask extends over multiple words.

On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:

> Actually, if you look closely at the definition of those two values,
you'll see that it really doesn't matter which one we loop over. The
NUM_BITS value defines the actual total number of bits in the mask. The
CPU_MAX is the total number of cpus we can support, which was set to a value
such that the two are equal (i.e., it's a power of two that happens to be an
integer multiple of 64).
>
> I believe the original intent was to allow CPU_MAX to be independent of
address-alignment questions, so NUM_BITS could technically be greater than
CPU_MAX. Even if this happens, though, all that would do is cause the loop
to run across more bits than required.
>
> So it doesn't introduce a limitation at all. In hindsight, we could
simplify things by eliminating one of those values and just putting a
requirement on the number that it be a multiple of 64 so it aligns with a
memory address.
>
>
> On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
>
>> Nadia,
>>
>> Interesting. I haven't tried pushing this to levels above 8 on a
particular
>> machine. Do you think that the cpuset / paffinity / hwloc only applies at
>> the machine level, at which time you need to employ a graph with carto?
>>
>> Regards,
>>
>> Ken
>>
>> -----Original Message-----
>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
>> Behalf Of nadia.derbey
>> Sent: Monday, August 29, 2011 5:45 AM
>> To: Open MPI Developers
>> Subject: [OMPI devel] known limitation or bug in hwloc?
>>
>> Hi list,
>>
>> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64.
>>
>> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is
>> the routine that sets the calling process affinity to the mask given as
>> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we
>> allow the cpus to be potentially numbered up to
>> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
>>
>> The problem with module_set() is that is loops over
>> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in
>> the mask:
>>
>> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i)
>> {
>> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
>> hwloc_bitmap_set(set, i);
>> }
>> }
>>
>> Given "mask"'s type, I think module_set() should instead loop over
>> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
>>
>> Note that module_set() uses a type for its internal mask that is
>> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
>>
>> So I'm wondering whether this is a known limitation I've never heard of
>> or an actual bug?
>>
>> Regards,
>> Nadia
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3865 - Release Date: 08/29/11