Greetings,
I
use hwloc-1.4.1 stable on Red Hat 5 and am seeing a possible
concurrency issue not covered by the "Thread Safety" guidelines:
- I start a small number (4) of threads, each of which does
some work and periodically executes
hwloc_get_last_cpu_location() with HWLOC_CPUBIND_PROCESS
- occasionally, one or two of those threads will see the call
fail with ENOSYS (even though the same call has already executed
successfully a number of times)
These errors are transient and seem to occur only when some
of the threads in the group are terminating. I've skimmed
through the implementation in topology-linux.c and it seems
plausible to me that the errors could be caused by failure to
read /proc state "atomically" in the presence of concurrent
thread starts/exits.
Of course, the latter is hard (impossible ?) to do because
the state always changes and a snapshot can only be obtained
with a single read() (which in turn would require knowing how
many thread entries to expect in advance). However, returning
ENOSYS in such cases does not seems intended but rather a flaw
in retry logic. Similar issues may be present with other API
methods that rely on hwloc_linux_foreach_proc_tid() or hwloc_linux_get_proc_tids().