Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] known limitation or bug in hwloc?
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-29 11:57:59


On Aug 29, 2011, at 8:35 AM, nadia.derbey_at_[hidden] wrote:

>
> devel-bounces_at_[hidden] wrote on 08/29/2011 04:20:30 PM:
>
> > De : Ralph Castain <rhc_at_[hidden]>
> > A : Open MPI Developers <devel_at_[hidden]>
> > Date : 08/29/2011 04:26 PM
> > Objet : Re: [OMPI devel] known limitation or bug in hwloc?
> > Envoyé par : devel-bounces_at_[hidden]
> >
> > Actually, I'll eat those words. I was looking at the wrong place.
> >
> > Yes, that is a bug in hwloc. It needs to loop over CPU_MAX for those
> > cases where the bit mask extends over multiple words.
>
> But I'm afraid the fix won't be trivial at all: hwloc in itself is coherent: it loops overs NUM_BITS, but it uses masks that are NUM_BITS wide (hwloc_bitmap_t set)...

I guess I'm missing that - I just did a search and cannot find any reference to OPAL_PAFFINITY_BITMASK_T_NUM_BITS anywhere in paffinity/hwloc after the last change.

Can you point me to where you believe a problem exists? Or feel free to submit a patch to fix it :-) We can push it upstream to the hwloc folks for their consideration.

>
> Regards,
> Nadia
> >
> >
> > On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
> >
> > > Actually, if you look closely at the definition of those two
> > values, you'll see that it really doesn't matter which one we loop
> > over. The NUM_BITS value defines the actual total number of bits in
> > the mask. The CPU_MAX is the total number of cpus we can support,
> > which was set to a value such that the two are equal (i.e., it's a
> > power of two that happens to be an integer multiple of 64).
> > >
> > > I believe the original intent was to allow CPU_MAX to be
> > independent of address-alignment questions, so NUM_BITS could
> > technically be greater than CPU_MAX. Even if this happens, though,
> > all that would do is cause the loop to run across more bits than required.
> > >
> > > So it doesn't introduce a limitation at all. In hindsight, we
> > could simplify things by eliminating one of those values and just
> > putting a requirement on the number that it be a multiple of 64 so
> > it aligns with a memory address.
> > >
> > >
> > > On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
> > >
> > >> Nadia,
> > >>
> > >> Interesting. I haven't tried pushing this to levels above 8 on a particular
> > >> machine. Do you think that the cpuset / paffinity / hwloc only applies at
> > >> the machine level, at which time you need to employ a graph with carto?
> > >>
> > >> Regards,
> > >>
> > >> Ken
> > >>
> > >> -----Original Message-----
> > >> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> > >> Behalf Of nadia.derbey
> > >> Sent: Monday, August 29, 2011 5:45 AM
> > >> To: Open MPI Developers
> > >> Subject: [OMPI devel] known limitation or bug in hwloc?
> > >>
> > >> Hi list,
> > >>
> > >> I'm hitting a limitation with paffinity/hwloc with cpu numbers >= 64.
> > >>
> > >> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c, module_set() is
> > >> the routine that sets the calling process affinity to the mask given as
> > >> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t (so we
> > >> allow the cpus to be potentially numbered up to
> > >> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
> > >>
> > >> The problem with module_set() is that is loops over
> > >> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check if these bits are set in
> > >> the mask:
> > >>
> > >> for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i)
> > >> {
> > >> if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
> > >> hwloc_bitmap_set(set, i);
> > >> }
> > >> }
> > >>
> > >> Given "mask"'s type, I think module_set() should instead loop over
> > >> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
> > >>
> > >> Note that module_set() uses a type for its internal mask that is
> > >> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS (hwloc_bitmap_t).
> > >>
> > >> So I'm wondering whether this is a known limitation I've never heard of
> > >> or an actual bug?
> > >>
> > >> Regards,
> > >> Nadia
> > >>
> > >>
> > >> _______________________________________________
> > >> devel mailing list
> > >> devel_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >> -----
> > >> No virus found in this message.
> > >> Checked by AVG - www.avg.com
> > >> Version: 10.0.1392 / Virus Database: 1520/3864 - Release Date: 08/28/11
> > >>
> > >> _______________________________________________
> > >> devel mailing list
> > >> devel_at_[hidden]
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel