Le 28/08/2013 16:20, Jiri Hladky a écrit :
I got your point. On the other hand I think that hwloc-distrib is at the moment not flexible enough to handle such case. I believe that the current strategy - start from the first object - is not the best one. From my experience, core 0 is always most used by the system so it seems that better strategy would to allocate the cores from the last one.

Most people expect their jobs to be launched in order: process0 on first cores, process1 on next cores, etc.

Again, you don't want to reverse the output, you want to ignore first core/socket if possible.

I was looking at the source code of the hwloc-distrib and I believe that only this part of the code would be affected:

      for (i = 0; i < chunks; i++)
        roots[i] = hwloc_get_obj_by_depth(topology, from_depth, i);  => change this to roots[i] = hwloc_get_obj_by_depth(topology, from_depth, MAX_COUNT - i);

      hwloc_distributev(topology, roots, chunks, cpuset, n, to_depth); => rewrite this to iterate in the reverse direction

You can do the exact same thing by reversing your loop in the caller.

Anyway, reversing the loop just move the core you don't want to the end of the list. But if you use the entire list, you end up using the exact same cores. You just reordered the processes among these cores.

Am I missing something? In case of infinite bitmap hwloc-distrib will error out. This should solve the problems with hwloc_bitmap_singlify.

We need a new singlify() above, one that doesn't use the first bit. That's what will make you use a core that is not the first one in each socket.

Problems with infinite bitmaps are unrelated with hwloc-distrib, but they need to be handled in that new bitmap API anyway.