Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] [hwloc-announce] Hardware locality (hwloc) v1.1rc1 released
From: Jirka Hladky (jhladky_at_[hidden])
Date: 2010-11-10 20:00:31


On Wednesday, November 10, 2010 05:27:49 pm Brice Goglin wrote:
> Le 10/11/2010 15:02, Jirka Hladky a écrit :
> >>> 2) hwloc-bind --get --membind is not working for me (RHEL 6.0)
> >>> $ hwloc-bind --membind node:1 --mempolicy interleave -- hwloc-bind
> >>> --get -- membind
> >>> hwloc_get_membind failed (errno 22 Invalid argument)
> >>
> >> You get the same error when running only "hwloc-bind --get --membind",
> >> right?
> >
> > Yes:
> > $ hwloc-bind --get --membind
> > hwloc_get_membind failed (errno 22 Invalid argument)
> >
> >> I am not sure about this one. Do you have NUMA support in your kernel?
> >> Is your machine NUMA? Can you send the gather-topology tarball ? (if we
> >> don't have it already :))
> >
> > Yes, it's a NUMA box with NUMA support in kernel.
>
> Unfortunately, I can't reproduce. I tried with your tarball, with a
> Redhat 5 machine, with a similar Nehalem-based machine running Debian.
>
> Can you try to debug this? I'd like to know if EINVAL is returned by the
> kernel or by hwloc. You'd have to open src/topology-linux.c, go in
> function hwloc_linux_get_thisthread_membind() and add some printf there
> to check where EINVAL comes from.
>
> thanks,
> Brice

Hi Brice,

I have added some printf and perror. EINVAL is coming from get_mempolicy call:

==============================================================
  /* compute max_os_index */
  complete_nodeset = hwloc_topology_get_complete_nodeset(topology);
  if (complete_nodeset) {
    max_os_index = hwloc_bitmap_last(complete_nodeset);
    printf("max_os_index %u\n",max_os_index);
    if (max_os_index == (unsigned) -1)
      max_os_index = 0;
  } else {
    max_os_index = 0;
  }
  printf("max_os_index %u\n",max_os_index);
  /* round up to the nearest multiple of BITS_PER_LONG */
  max_os_index = (max_os_index + HWLOC_BITS_PER_LONG) & ~(HWLOC_BITS_PER_LONG
- 1);
  printf("max_os_index %u\n",max_os_index);

  linuxmask = malloc(max_os_index/HWLOC_BITS_PER_LONG * sizeof(long));
  if (!linuxmask) {
    errno = ENOMEM;
    goto out;
  }

  err = get_mempolicy(&linuxpolicy, linuxmask, max_os_index, 0, 0);
  if (err < 0) {
    perror("get_mempolicy");
    goto out_with_mask;
  }
==========================================================================

On system with 2 NUMA nodes:
$ utils/hwloc-bind --get --membind
max_os_index 1
max_os_index 1
max_os_index 64
get_mempolicy: Invalid argument
hwloc_get_membind failed (errno 22 Invalid argument)

I do not see any problem with your code. I don't know what's going on. Is
get_mempolicy itself buggy? How can I debug this?

Thanks!
Jirka