Open MPI logo

PLPA Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all PLPA Users mailing list

From: Bert Wesarg (wesarg_at_[hidden])
Date: 2007-04-27 09:58:30


Jeff Squyres wrote:
> On Apr 24, 2007, at 2:11 PM, Bert Wesarg wrote:
>
>>> Hmm. Is that really a good idea? I'd think it was safer to say "I
>>> don't know" rather than "here's a fallback which may or may not be
>>> true."
>> Loud thinking:
>>
>> What is the worst case that could happen, if we have no thread
>> information, aka the file thread_siblings is missing:
>> There are threads, so for two cpu ids cpu_i, cpu_j there should be two
>> tuple with (x, y, a) and (x, y, b) where a != b. So we end up with a
>> mapping from cpu_i and cpu_j to (x, y, 0).
>>
>> So the best way is here to return topo unsupported. the best thing
>> is to
>> catch this thing in the cache building.
>
> According to your prior post, we should always have the hmt
> information if we have the socket/core information, right? If so I
> think that this is a moot point.
>
> Here's the matrix of possible information in the kernel, assembled
> from a few prior posts:
>
> - number of linux virtual processors: always available
> - socket/core mappings: available in >=2.6.16
> - hmt mappings: available >=2.6.16
> - node mappings: available ?for a long time?
at least two years, but only on kernels which are configured for NUMA.

>
> ------
>
> I think I'm finally understanding what you mean by adding a 3rd
> member to the (socket,core) tuple (for the [hardware] thread ID).
Sorry, but your are not understanding: the kernel exports a cpumap per
processor ID, in the file thread_siblings. In this map at least the
current processor bit should set. Other bits are only set if these
processors are in the same core and a thread siblings to each other. so
the kernel exports not a direct thread id.

A little picture, a xeon processor with one core and hyperthreading enabled:

         +------------------------+
         | Xeon |
         |phys_id = 0 core_id = 0 |
         +------------------------+
               / \
         +----------+ +----------+
         | thread | | thread |
         | siblings | | siblings |
         | 00000003 | | 00000003 |
         +----------+ +----------+
               | |
             cpu0 cpu1

from this I would build the following mapping:

cpu0 <-> (0,0,0)
cpu1 <-> (0,0,1)

because cpu0 has zero bits set before bit 0 in the thread_siblings cpumap
because cpu1 has one bit set before bit 1 in the thread_siblings cpumap

>
> So I think the question is: how do we want to expose the [hardware]
> thread information and node in the API?
>
> I think looking at the data relationships will help here:
>
> - one-to-one: (socket,core) -> processor ID
> - one-to-one: (socket,core) -> node ID
> - one-to-many: (socket,core) -> thread IDs
> - one-to-one: processor ID -> (socket,core)
> - one-to-one: processor ID -> node ID
> - one-to-one: processor ID -> thread ID
> - one-to-many: node ID -> (socket,core)
> - one-to-many: node ID -> processor IDs
> - one-to-many: node ID -> thread IDs (potentially not unique)
> - one-to-many: thread ID -> (socket,core)
> - one-to-many: thread ID -> processor IDs
> - one-to-many: thread ID -> node ID
>

Your old mappings with thread:
- one-to-one: (socket,core,thread) -> processor ID
- one-to-one: (socket,core,thread) -> node ID
- one-to-one: processor ID -> (socket,core,thread)
- one-to-one: processor ID -> node ID
- one-to-many: node ID -> (socket,core,thread)
- one-to-many: node ID -> processor ID

My opinion:

(1) one-to-one: (socket,core,thread) <-> processor ID

(2a) one-to-one: processor ID -> node ID
(2b) one-to-many: node ID -> processor ID

I hope this makes it clear

> Looking at this list, I can imagine wanting to do lookups on any of
> them (looking up <foo> given a specific thread ID is probably
> somewhat dubious, but I can imagine *some* application may want to do
> it...?).
>
> If it really *is* useful to lookup based on any of these items,
> perhaps we're looking at this the wrong way -- what if there was a
> single function that did all lookups, and search terms and result
> information was expressed in terms of a struct? Perhaps something
> like this:
>
> -----
> #define PLPA_CPU_INFO_WILDCARD -1
> #define PLPA_CPU_INFO_UNKNOWN -2
> typedef struct _plpa_cpu_info_t {
> int socket, core, processor_id, thread_id, node_id;
> } plpa_cpu_info_t;
>
> int plpa_lookup(plpa_cpu_info_t *search_term, plpa_cpu_info_t **results,
> int *result_count);
>
> /* Example usage */
>
> void foo() {
> plpa_cpu_info_t search, *results;
> int result_count;
>
> /* Search on a (socket,core) tuple */
> search.socket = desired_socket_id;
> search.core = desired_core_id;
> search.processor_id = search.thread_id = search.node_id =
> PLPA_CPU_INFO_WILDCARD;
> if (0 == plpa_lookup(&search, &results, &result_count)) {
> /* can look through results[0] - results[result_count -1] */
> free(results);
> }
>
> /* Search on a node ID */
> search.node_id = desired_node_id;
> search.processor_id = search.thread_id = search.core =
> search.socket = PLPA_CPU_INFO_WILDCARD;
> if (0 == plpa_lookup(&search, &results, &result_count)) {
> /* can look through results[0] - results[result_count -1] */
> free(results);
> }
>
> /* ...and so on */
This approach looks like an higher level abstraction to the lower level
mapping.

> -----
>
> Hence, you get get an array of struct's back that you can examine for
> whatever you want, and you can eventually release in a single call to
> free(). This seems simple and straightforward.
>
> Thoughts?
>