Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-devel] hwloc-distrib - please add the option to distribute the jobs in the reverse direction
From: Jiri Hladky (hladky.jiri_at_[hidden])
Date: 2013-08-28 10:20:43

On Tue, Aug 27, 2013 at 7:57 PM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:
> You just explained why I don't like weights. Some people will want to
> ignore L2, some won't. Specifying all this on the command-line would be
> horrible, and implementing it will be horrible too.

:-) Agreed.

> > I think that --reverse option is much easier for the implementation
> > and for the clear requirement and understanding how the output should
> > look like.
> Implementing reverse bitmap_singlify() isn't so easy.
> Also "--reverse" would have a semantics that no users ever requested,
> it's only a workaround for your actual need ("ignore core0 if
> possible"). What if somebody laer comes with a machine where he wants to
> preferably ignore core 7 and maybe ignore core 11 too, because some
> special daemons are running there? We'd need to add
> --dont-reverse-but-ignore-some-cores-if-possible. Or what if somebody
> wants to ignore the first core but still get other cores in the normal
> order?

I got your point. On the other hand I think that hwloc-distrib is at the
moment not flexible enough to handle such case. I believe that the current
strategy - start from the first object - is not the best one. From my
experience, core 0 is always most used by the system so it seems that
better strategy would to allocate the cores from the last one. So for
example, when I say that I would like to avoid PU#0 then it means I would
like in fact avoid Socket#0 as well as long as possible. The same applies
to NUMANode#0.

I was looking at the source code of the hwloc-distrib and I believe that
only this part of the code would be affected:

      for (i = 0; i < chunks; i++)
        roots[i] = hwloc_get_obj_by_depth(topology, from_depth, i); =>
change this to roots[i] = hwloc_get_obj_by_depth(topology, from_depth,

      hwloc_distributev(topology, roots, chunks, cpuset, n, to_depth); =>
rewrite this to iterate in the reverse direction

MAX_COUNT seems to be known and accessible as topology->nb_levels.

Am I missing something? In case of infinite bitmap hwloc-distrib will error
out. This should solve the problems with hwloc_bitmap_singlify.

I tend to think we should let the application handle these specific
> cases (finding what can be ignored while still having enough objects,
> and then calling distribute accordingly).

Actually I believe that this change is more easily implemented directly in
the C code rather then using some work-around in Bash. And I believe that
the use case is not such exotic. As outlined above, sarting from core#0 is
not always the best strategy....

Please let me know what do you think.