Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: move hwloc code base to opal/hwloc
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-02 10:03:29


To follow up on this RFC...

We discussed this RFC on the weekly call and no one seemed to hate it. But there was a desire to:

a) be able to compile out hwloc for environments that don't want/need it (e.g., embedded environments)
b) have some degree of isolation in case hwloc ever dies
c) have some comonality of hwloc support (e.g., a central copy of the topology as an OPAL global variable, etc.)

The agreed-on compromise was to have a small set of OPAL wrappers that hide the real hwloc API. I.e., the OPAL/ORTE/OMPI code bases would use the OPAL wrappers, not hwloc itself. This allows OMPI to cleanly compile out hwloc (e.g., return OPAL_ERR_NOT_AVAILABLE when hwloc is compiled out) for platforms that do not want hwloc support and hwloc-unsupported platforms.

The ball is in my court to come up with a decent OPAL subset of the hwloc API that makes sense. On the one hand, the hwloc API is huge because it has many, many accessors for all different kinds of access patterns. But OTOH, we probably don't need all those accessors, even if having a smaller set of accessors may mean slightly less convenient/efficient access to the hwloc data.

I'll try to strike a balance and come back to the community with a proposal.

On May 13, 2010, at 8:35 PM, Jeff Squyres wrote:

> WHAT: hwloc is currently embedded in opal/mca/paffinity/hwloc/hwloc -- move it to be a first class citizen in opal/hwloc.
>
> WHY: Let other portions of the OPAL, ORTE, and OMPI code bases use hwloc services (remember that hwloc provides detailed topology information, not just processor binding).
>
> WHERE: Move opal/mca/paffinity/hwloc/hwloc to opal/hwloc, and adjust associated configury
>
> WHEN: For v1.5.1
>
> TIMEOUT: Tuesday call, May 25
>
> -----------------------------------------------------------------------------
>
> MORE DETAILS:
>
> The hwloc code base is *much* more powerful and useful than PLPA -- it provides a wealth of information that PLPA did not. Specifically: hwloc provides data structures detailing the internal topology of a server. You can see cache line sizes, NUMA layouts, sockets, cores, hardware threads, ...etc.
>
> This information should be available to the entire OMPI code base -- not just locked up in a paffinity component. Putting hwloc up in opal/hwloc makes it available everywhere. Developers can just call hwloc_<foo>, and OMPI's build system will automatically do all the right symbol-shifting if the embedded hwloc is used in OMPI (and not symbol-shift if an external hwloc is used, obviously). It's magically delicious!
>
> One immediate use that I'd like to see is to have the openib BTL use hwloc's ibv functionality to find "nearby" HCAs (right now, you can only do this with rankfiles).
>
> I can foresee other components using cache line size information to help tune performance (e.g., sm btl and sm coll...?).
>
> To be clear: there will still be an hwloc paffinity component. It just won't embed its own copy of hwloc anymore. It'll use the hwloc services provided by the OMPI build system, just like the rest of the OPAL / ORTE / OMPI code bases.
>
> There will also be an option to compile hwloc out altogether -- some stubs will be left that return ERR_NOT_SUPPORTED, or somesuch (details TBD). The reason for this is that there are some systems where processor affinity and NUMA information aren't relevant (e.g., embedded systems). Memory footprint is key in such systems; hwloc would simply take up valuable RAM.
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/