Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] hwloc-distrib - please add the option to distribute the jobs in the reverse direction
From: Jiri Hladky (hladky.jiri_at_[hidden])
Date: 2013-08-28 10:20:43

On Tue, Aug 27, 2013 at 7:57 PM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:
> You just explained why I don't like weights. Some people will want to
> ignore L2, some won't. Specifying all this on the command-line would be
> horrible, and implementing it will be horrible too.

:-) Agreed.

> > I think that --reverse option is much easier for the implementation
> > and for the clear requirement and understanding how the output should
> > look like.
> Implementing reverse bitmap_singlify() isn't so easy.
> Also "--reverse" would have a semantics that no users ever requested,
> it's only a workaround for your actual need ("ignore core0 if
> possible"). What if somebody laer comes with a machine where he wants to
> preferably ignore core 7 and maybe ignore core 11 too, because some
> special daemons are running there? We'd need to add
> --dont-reverse-but-ignore-some-cores-if-possible. Or what if somebody
> wants to ignore the first core but still get other cores in the normal
> order?

I got your point. On the other hand I think that hwloc-distrib is at the
moment not flexible enough to handle such case. I believe that the current
strategy - start from the first object - is not the best one. From my
experience, core 0 is always most used by the system so it seems that
better strategy would to allocate the cores from the last one. So for
example, when I say that I would like to avoid PU#0 then it means I would
like in fact avoid Socket#0 as well as long as possible. The same applies
to NUMANode#0.

I was looking at the source code of the hwloc-distrib and I believe that
only this part of the code would be affected:

      for (i = 0; i < chunks; i++)
        roots[i] = hwloc_get_obj_by_depth(topology, from_depth, i); =>
change this to roots[i] = hwloc_get_obj_by_depth(topology, from_depth,

      hwloc_distributev(topology, roots, chunks, cpuset, n, to_depth); =>
rewrite this to iterate in the reverse direction

MAX_COUNT seems to be known and accessible as topology->nb_levels.

Am I missing something? In case of infinite bitmap hwloc-distrib will error
out. This should solve the problems with hwloc_bitmap_singlify.

I tend to think we should let the application handle these specific
> cases (finding what can be ignored while still having enough objects,
> and then calling distribute accordingly).

Actually I believe that this change is more easily implemented directly in
the C code rather then using some work-around in Bash. And I believe that
the use case is not such exotic. As outlined above, sarting from core#0 is
not always the best strategy....

Please let me know what do you think.