Open MPI logo

PLPA Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all PLPA Users mailing list

Subject: [PLPA users] Extensions to PLPA
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-08-08 17:35:41


Discussions came up in OMPI over the past 48 hours about processor
affinity that resulted in some proposed extensions to PLPA.

The gist of it is that OMPI wants to deal with "logical" processor,
socket, and core numbers, meaning "the 3rd processor" or "the 2nd
socket" -- regardless of what the underlying Linux ID numbers for
these entities are (i.e., it gets confusing if the Linux IDs have
"holes" in them, as we have discussed before). We felt that this
would be much easier for the end user.

This means 3 things to PLPA:

1. Clarify that all existing functions are dealing with Linux native
IDs -- in plpa.h, we currently say (socket,core) tuple; we should
probably clarify that to be (socket_id,core_id). And so on. This is
just changes in the comments/documentation.

2. Add 3 new functions to allow the mapping of "logical" numbers to
Linux ID numbers. We have this information in PLPA; we might as well
expose it for applications that don't want to deal with the complexity
of figuring out the mapping from the "Nth processor" to Linux
processor ID X.

-----
/* Returns the Linux processor ID for the Nth processor. For example,
    if the processor IDs have "holes", use this function to say "give
    me the Linux processor ID of the 4th processor." Returns 0 upon
    success. */
int PLPA_NAME(get_processor_id)(int processor_num, int *processor_id);

/* Returns the Linux socket ID for the Nth socket. For example, if
    the socket IDs have "holes", use this function to say "give me the
    Linux socket ID of the 2nd socket." Returns 0 upon success. */
int PLPA_NAME(get_socket_id)(int socket_num, int *socket_id);

/* Given a specific socket, returns the Linux core ID for the Nth
    core. For example, if the core IDs have "holes", use this function
    to say "give me the Linux core ID of the 4th core on socket ID 7."
    Returns 0 upon success. */
int PLPA_NAME(get_core_id)(int socket_id, int core_num, int *core_id);
-----

--> We know that linux processor ID's can have holes. I am pretty
sure that socket IDs can have holes, too. I'm not sure if core IDs
can have holes, but I guess it's conceivable that in the manycore
world, you could have a socket where 1 core is dead and other cores
are fine...?

3. Given the advent of CPU hotplugging (I don't know where that was
introduced in the linux kernel history, but it does seem to be
available in some recent kernels), three things would seem to be useful:

3a. PLPA should check the online status of the processor (if the
kernel supports it, there's an "online" file in the cpuX /sys
directory). If the "online" file does not exist, or if the cpuX
directory does not exist, assume that CPU hotplugging support is not
there and therefore all processors are online.

3b. Add functions to check the existence and online status of a
processor (using Linux native IDs):

/* Check to see if a given Linux processor ID exists / is online.
Returns
    0 on success. */
int PLPA_NAME(get_processor_flags)(int processor_id, int *exists, int
*online);

/* Check to see if a given Linux (socket_id,core_id) tuple exists / is
    online. Returns 0 on success. */
int PLPA_NAME(get_processor_flags)(int socket_id, int core_id,
                                    int *exists, int *online);

--> I debated on returning a single int (or uint32_t) with some bits
set for "exists" and "online", but I chose to returning 2 ints fairly
arbitrarily. I don't have strong feelings either way, so if someone
else does, I could be fairly easily convinced to change it.

3c. PLPA currently caches all information up front and then only
returns cached data after that, but with CPU hotplugging, that
information may become stale. So we should offer new functionality
regarding PLPA's caching behavior: a) refresh it right now, b) don't
use the cache at all, c) do use the cache.

/* Set PLPA's cache behavior. Returns 0 upon success. */
typedef enum {
     /* Use the cache (default behavior); fills the cache right now if
        it's not already full */
     PLPA_CACHE_USE,
     /* Never use the cache; always look up the information in
        the kernel */
     PLPA_CACHE_IGNORE,
     /* Refresh the cache right now */
     PLPA_CACHE_REFRESH
} PLPA_NAME(cache_behavior_t);
int PLPA_NAME(cache)(PLPA_NAME(cache_behavior_t));

Thoughts?

-- 
Jeff Squyres
Cisco Systems