Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] known limitation or bug in hwloc?
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-29 09:16:08


Actually, if you look closely at the definition of those two values, you'll see that it really doesn't matter which one we loop over. The NUM_BITS value defines the actual total number of bits in the mask. The CPU_MAX is the total number of cpus we can support, which was set to a value such that the two are equal (i.e., it's a power of two that happens to be an integer multiple of 64).

I believe the original intent was to allow CPU_MAX to be independent of address-alignment questions, so NUM_BITS could technically be greater than CPU_MAX. Even if this happens, though, all that would do is cause the loop to run across more bits than required.

So it doesn't introduce a limitation at all. In hindsight, we could simplify things by eliminating one of those values and just putting a requirement on the number that it be a multiple of 64 so it aligns with a memory address.

On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:

> Nadia,
>
> Interesting. I haven't tried pushing this to levels above 8 on a particular
> machine. Do you think that the cpuset / paffinity / hwloc only applies at
> the machine level, at which time you need to employ a graph with carto?
>
> Regards,
>
> Ken
>
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> Behalf Of nadia.derbey
> Sent: Monday, August 29, 2011 5:45 AM
> To: Open MPI Developers
> Subject: [OMPI devel] known limitation or bug in hwloc?
>
> Hi list,
>
> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64.
>
> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is
> the routine that sets the calling process affinity to the mask given as
> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we
> allow the cpus to be potentially numbered up to
> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
>
> The problem with module_set() is that is loops over
> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in
> the mask:
>
> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i)
> {
> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
> hwloc_bitmap_set(set, i);
> }
> }
>
> Given "mask"'s type, I think module_set() should instead loop over
> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
>
> Note that module_set() uses a type for its internal mask that is
> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
>
> So I'm wondering whether this is a known limitation I've never heard of
> or an actual bug?
>
> Regards,
> Nadia
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel