I had a nice chat with Ralph this afternoon about this topic.
He pointed out a few things to me:
- I had forgotten (ahem) that carto has weights associated with each of its edges (and that's kind of a defining feature). hwloc, at present, does not. So perhaps hwloc would not initially replace carto -- maybe in some future future hwloc version.
- He also pointed out that not only paffinity, but also sysinfo, could be replaced if hwloc comes in.
He also made a good point that hwloc is only "sorta" extensible right now -- meaning that, sure, you can add support for new OS's and platforms, but not in as easy/clean a way as we have in Open MPI. Specifically, adding new support right now means editing much of the current hwloc code: configure, adding #if's to the top-level tools and library core, etc. It's not nearly as clean as just adding a new plugin that is totally independent of the rest of the code base. He thought it would be [greatly] beneficial if hwloc uses the same plugin system as Open MPI before bringing it in. Indeed, Open MPI may wish to extend hwloc in ways that the main hwloc project is not interested in extending (e.g., supporting some of Cisco's custom hardware). Fair point.
Additionally, the topic of plugins came up within the context of heterogeneity: have code to get the topology of the machine (RAM + processors), but have separate code to mix in accelerators/co-processors and other entities in the box. One could easily imagine plugins for each different type of entity that you would want to detect within a server.
To some extent, the hwloc crew has already been discussing these issues -- we can probably work elements of much of it into what we're doing. For example, Brice and Samuel are working on adding PCI device support to hwloc (although I haven't been following the details of what they're doing). We've also talked about adding hwloc functions for editing the map that comes back. For example, hwloc could be used as the cornerstone for a new OPAL framework base, and new plugins in this base can use functions to add more information to the initial map that is reported back by the hwloc core. [shrug] Need to think about that more.
This is all excellent feedback (I need to take it back to the hwloc crew); please let me know what else you think about these ideas tomorrow on the call.
On Dec 14, 2009, at 4:13 PM, Jeff Squyres wrote:
> Question for everyone (possibly a topic for tomorrow's call...):
> hwloc is evolving into a fairly nice package. It's not ready for inclusion into Open MPI yet, but it's getting there. I predict it will come in somewhere early in the 1.5 series (potentially not 1.5.0, though). hwloc will provide two things:
> 1. A listing of all processors and memory, to include caches (and cache sizes!) laid out in a map, so you can see what processors share what memory (e.g., caches). Open MPI currently does not have this capability. Additionally, hwloc is currently growing support to include PCI devices in the map; that may make it into hwloc v1.0 or not.
> 2. Cross-platform / OS support. hwloc currently support a nice variety of OSs and hardware platforms.
> Given that hwloc is already cross-platform, do we really need the carto framework? I.e., do we really need multiple carto plugins? More specifically: should we just use hwloc directly -- with no framework?
> Random points:
> - I'm about halfway finished with "embedding" code for hwloc like PLPA has, so, for example, all of hwloc's symbols can be prepended with opal_ or orte_ or whatever. Hence, embedding hwloc in OMPI would be "safe".
> - If we keep the carto framework, then we'll have to translate from hwloc's map to carto's map; there may be subtleties involved in the translation.
> - I guarantee that [much] more thought has been put into the hwloc map data structure design than carto's. :-) Indeed, to make all of hwloc's data available to OMPI, carto's map data structures may end up evolving to look pretty much exactly like hwloc's. In which case -- what's the point of carto?
> hwloc also provides processor binding functions, so it might also make the paffinity framework moot...
> Jeff Squyres