Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Solaris and hwloc
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-09-13 04:09:56


(resending because the formatting was bad)

Le 13/09/2012 00:26, Jeff Squyres a écrit :
> On Sep 12, 2012, at 10:30 AM, Samuel Thibault wrote:
>
>>> Sidenote: if hwloc-bind fails to bind, should we still launch the child process?
>> Well, it's up to you to decide :)
>
> Anyone have an opinion? I'm 60/40 in favor of not letting it run, under the rationale that the user asked for something that we can't deliver, so we shouldn't continue.
>
> Any idea what numactl does if it can't bind?

Let me add taskset to the list of tools to compare to, and distinguish
several cases:

1) invalid command line
* taskset (with invalid list "2,") errors out
* numactl (with invalid list "2,") errors out
* hwloc-bind (with invalid location followed by "-- executable") errors
out (considers the invalid location as the executable name)

2) valid command-line containing *only* non-existing objects:
* taskset errors out
* numactl errors out
* hwloc-bind succeeds, binds to nothing

3) valid command-line containing some existing objects and some
non-existing:
* taskset succeed (ignores unexisting objects, bind to others)
* numactl errors out
* hwloc-bind succeeds (ignores unexisting objects, bind to others)

4) valid command-line with only valid objects but missing OS support
* doesn't apply to taskset and numactl afaik
* hwloc-bind succeeds (ignores failure to bind)

We have a --strict option, which translate into the STRICT binding flag
which is documented as
  "Request strict binding from the OS. The function will fail if the
binding can not be guaranteed / completely enforced."
I usually see "non-strict" as 'if you can't do what I want, do something
similar". I wouldn't be too bad to say that this applies to (3) (bind to
smaller than requested).

But (2) and (4) are different. Not binding at all or binding to nothing
is far from "non-strict". But I wonder if adding a new command-line flag
to exit on such errors would be confusing with respect to the existing
--strict.

We could also change the default to exit on error, and add --force to
launch the process even on failure to bind. But changing defaults isn't
always a good idea.

Brice