Thanks a lot Ralph!
Regards,
--
Nadia Derbey
Phone: +33 (0)4 76 29 77 62
devel-bounces@open-mpi.org wrote on 08/29/2011 06:12:13
PM:
> De : Ralph Castain <rhc@open-mpi.org>
> A : Open MPI Developers <devel@open-mpi.org>
> Date : 08/29/2011 06:12 PM
> Objet : Re: [OMPI devel] known limitation or
bug in hwloc?
> Envoyé par : devel-bounces@open-mpi.org
>
> On Aug 29, 2011, at 10:08 AM, nadia.derbey@bull.net wrote:
>
> devel-bounces@open-mpi.org wrote on 08/29/2011 05:57:59 PM:
>
> > De : Ralph Castain <rhc@open-mpi.org>
> > A : Open MPI Developers <devel@open-mpi.org>
> > Date : 08/29/2011 05:58 PM
> > Objet : Re: [OMPI devel] known limitation or bug in hwloc?
> > Envoyé par : devel-bounces@open-mpi.org
> >
> > On Aug 29, 2011, at 8:35 AM, nadia.derbey@bull.net wrote:
> >
> >
> > devel-bounces@open-mpi.org wrote on 08/29/2011 04:20:30 PM:
> >
> > > De : Ralph Castain <rhc@open-mpi.org>
> > > A : Open MPI Developers <devel@open-mpi.org>
> > > Date : 08/29/2011 04:26 PM
> > > Objet : Re: [OMPI devel] known limitation or bug in hwloc?
> > > Envoyé par : devel-bounces@open-mpi.org
> > >
> > > Actually, I'll eat those words. I was looking at the wrong
place.
> > >
> > > Yes, that is a bug in hwloc. It needs to loop over CPU_MAX
for those
> > > cases where the bit mask extends over multiple words.
> >
> > But I'm afraid the fix won't be trivial at all: hwloc in itself
is
> > coherent: it loops overs NUM_BITS, but it uses masks that are
> > NUM_BITS wide (hwloc_bitmap_t set)...
> >
> > I guess I'm missing that - I just did a search and cannot find
any
> > reference to OPAL_PAFFINITY_BITMASK_T_NUM_BITS anywhere in
> > paffinity/hwloc after the last change.
> >
> > Can you point me to where you believe a problem exists? Or feel
free
> > to submit a patch to fix it :-) We can push it upstream
to the
> > hwloc folks for their consideration.
>
> file: opal/mca/paffinity/hwloc/paffinity_hwloc_module.c
> routine: module_set()
>
> You hae a reference to OPAL_PAFFINITY_BITMASK_T_NUM_BITS both in the
> trunk and in v1.5
>
> But may be this issue has been fixed already?
>
> I fixed it in the trunk (r25102) per this thread and filed a CMR to
> move it to v1.5. You should be copied on the CMR ticket.
>
>
> Regards,
> Nadia
>
> >
> >
> > Regards,
> > Nadia
> > >
> > >
> > > On Aug 29, 2011, at 7:16 AM, Ralph Castain wrote:
> > >
> > > > Actually, if you look closely at the definition of
those two
> > > values, you'll see that it really doesn't matter which one
we loop
> > > over. The NUM_BITS value defines the actual total number
of bits in
> > > the mask. The CPU_MAX is the total number of cpus we can
support,
> > > which was set to a value such that the two are equal (i.e.,
it's a
> > > power of two that happens to be an integer multiple of 64).
> > > >
> > > > I believe the original intent was to allow CPU_MAX
to be
> > > independent of address-alignment questions, so NUM_BITS
could
> > > technically be greater than CPU_MAX. Even if this happens,
though,
> > > all that would do is cause the loop to run across more bits
thanrequired.
> > > >
> > > > So it doesn't introduce a limitation at all. In hindsight,
we
> > > could simplify things by eliminating one of those values
and just
> > > putting a requirement on the number that it be a multiple
of 64 so
> > > it aligns with a memory address.
> > > >
> > > >
> > > > On Aug 29, 2011, at 7:05 AM, Kenneth Lloyd wrote:
> > > >
> > > >> Nadia,
> > > >>
> > > >> Interesting. I haven't tried pushing this to levels
above 8 on
> > a particular
> > > >> machine. Do you think that the cpuset / paffinity
/ hwloc
> only applies at
> > > >> the machine level, at which time you need to employ
a graph with carto?
> > > >>
> > > >> Regards,
> > > >>
> > > >> Ken
> > > >>
> > > >> -----Original Message-----
> > > >> From: devel-bounces@open-mpi.org [mailto:devel-bounces@open-mpi.org]
On
> > > >> Behalf Of nadia.derbey
> > > >> Sent: Monday, August 29, 2011 5:45 AM
> > > >> To: Open MPI Developers
> > > >> Subject: [OMPI devel] known limitation or bug in
hwloc?
> > > >>
> > > >> Hi list,
> > > >>
> > > >> I'm hitting a limitation with paffinity/hwloc with
cpu numbers >= 64.
> > > >>
> > > >> In opal/mca/paffinity/hwloc/paffinity_hwloc_module.c,
module_set() is
> > > >> the routine that sets the calling process affinity
to the mask given as
> > > >> parameter. Note that "mask" is a opal_paffinity_base_cpu_set_t
(so we
> > > >> allow the cpus to be potentially numbered up to
> > > >> OPAL_PAFFINITY_BITMASK_CPU_MAX - 1).
> > > >>
> > > >> The problem with module_set() is that is loops
over
> > > >> OPAL_PAFFINITY_BITMASK_T_NUM_BITS bits to check
if these bitsare set in
> > > >> the mask:
> > > >>
> > > >> for (i = 0; ((unsigned int) i) <
> OPAL_PAFFINITY_BITMASK_T_NUM_BITS; ++i)
> > > >> {
> > > >> if (OPAL_PAFFINITY_CPU_ISSET(i,
mask)) {
> > > >> hwloc_bitmap_set(set,
i);
> > > >> }
> > > >> }
> > > >>
> > > >> Given "mask"'s type, I think module_set()
should instead loop over
> > > >> OPAL_PAFFINITY_BITMASK_CPU_MAX bits.
> > > >>
> > > >> Note that module_set() uses a type for its internal
mask that is
> > > >> coherent with OPAL_PAFFINITY_BITMASK_T_NUM_BITS
(hwloc_bitmap_t).
> > > >>
> > > >> So I'm wondering whether this is a known limitation
I've never heard of
> > > >> or an actual bug?
> > > >>
> > > >> Regards,
> > > >> Nadia
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> devel mailing list
> > > >> devel@open-mpi.org
> > > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > >> -----
> > > >> No virus found in this message.
> > > >> Checked by AVG - www.avg.com
> > > >> Version: 10.0.1392 / Virus Database: 1520/3864
- Release Date: 08/28/11
> > > >>
> > > >> _______________________________________________
> > > >> devel mailing list
> > > >> devel@open-mpi.org
> > > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > >
> > >
> > >
> > > _______________________________________________
> > > devel mailing list
> > > devel@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > _______________________________________________
> > devel mailing list
> > devel@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > _______________________________________________
> > devel mailing list
> > devel@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel