Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-11-29 12:21:43


Greetings all. I'm writing this to ask for help from the general
development community. We've run into a problem with Linux processor
affinity, and although I've individually talked to a lot of people
about this, no one has been able to come up with a solution. So I
thought I'd open this to a wider audience.

This is a long-ish e-mail; bear with me.

As you may or may not know, Open MPI includes support for processor and
memory affinity. There are a number of benefits, but I'll skip that
discussion for now. For more information, see the following:

http://www.open-mpi.org/faq/?category=building#build-paffinity
http://www.open-mpi.org/faq/?category=building#build-maffinity
http://www.open-mpi.org/faq/?category=tuning#paffinity-defs
http://www.open-mpi.org/faq/?category=tuning#maffinity-defs
http://www.open-mpi.org/faq/?category=tuning#using-paffinity

Here's the problem: there are 3 different APIs for processor affinity
in Linux. I have not done exhaustive research on this, but which API
you have seems to depend on your version of kernel, glibc, and/or Linux
vendor (i.e., some vendors appear to port different versions of the API
to their particular kernel/glibc). The issue is that all 3 versions of
the API use the same function names (sched_setaffinity() and
sched_getaffinity()), but they change the number and types of the
parameters to these functions.

This is not a big problem for source distributions of Open MPI -- our
configure script figures out which one you have and uses preprocessor
directives to select the Right stuff in our code base for your
platform.

What *is* a big problem, however, is that ISVs can therefore not ship a
binary Open MPI installation and reasonably expect the processor
affinity aspects of it to work on multiple Linux platforms. That is,
if the ISV compiles for API #X and ships a binary to a system that has
API #Y, there are two options:

1. Processor affinity is disabled. This means that the benefits of
processor affinity won't be visible (not hugely important on 2-way
SMPs, but as the number of processors/cores increases, this is going to
become more important), and Open MPI's NUMA-aware collectives won't be
able to be used (because memory affinity may not be useful without
processor affinity guarantees).

2. Processor affinity is enabled, but the code invokes API #X on a
system with API #Y. This will have unpredictable results, the best
case of which will be that processor affinity is simply [effectively]
ignored; the worst case of which will be that the application will fail
(e.g., seg fault).

Clearly, neither of these solutions are attractive.

My question to the developer crowd out there -- can you think of a way
around this? More specifically, is there a way to know -- at run time
-- which API to use? We can do some compiler trickery to compile all
three APIs into a single Open MPI installation and then run-time
dispatch to the Right one, but this is contingent upon being able to
determine which API to dispatch to. A bunch of us have poked around
and not found anything on the system that indicates which API you have
(e.g., looked in /proc and /sys), but not found anything.

Does anyone have any suggestions here?

Many thanks for your time.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/