Subject: [hwloc-devel] Support for new architecture
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-11-08 11:42:47

Hi folks

We are seeing a new architecture appearing in the very near future, and I'm not sure how hwloc will handle it. Consider the following case:

* I have a rack that contains multiple "hosts"

* each host consists of a box/shelf with common support infrastructure in it - it has some kind of controller in it, and might have some networking support, maybe a pool of memory that can be allocated across the occupants.

* in the host, I have one or more "boards". Each board again has a controller in it with some common infrastructure to support its local sockets - might include some networking that would look like NICs (though not necessarily on a PCIe interface), a board-level memory pool, etc.

* each socket contains one or more die. Each die runs its own instance of an OS - probably a lightweight kernel - that can vary between dies (e.g., might have a tweaked configuration), and has its own associated memory that will physically reside outside the socket. You can think of each die as constituting a "shared memory locus" - i.e., processes running on that die can share memory between them as it would sit under the same OS instance.

* each die has some number of cores/hwthreads/caches etc.

Note that the sockets are not sitting in some PCIe bus - they appear to be directly connected to the overall network just like a "node" would appear today. However, there is a definite need for higher layers (RMs and MPIs) to understand this overall hierarchy and the "distances" between the individual elements.

Any thoughts on how we can support this?