Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-09-06 13:25:12


The ultimate goal is to not add an additional dependency for serialization of the hwloc topology information. One way or another, we'll get there.

On Sep 6, 2011, at 11:46 AM, George Bosilca wrote:

> I guess that as long as there is an option to have any need for XML support compiled out, there is no reason to complain.
>
> george.
>
> On Sep 6, 2011, at 17:36 , Jeff Squyres wrote:
>
>> Don't forget that this RFC has a timeout of today. I didn't think it would be controversial, which is why it had a short timeout.
>>
>> -----
>>
>> Josh brought up a good point on the teleconf today that he'd like to be able to have hwloc without the the additional libxml dependency (i.e., the way it is on the trunk today).
>>
>> Remember that making hwloc a 1st class citizen is the first step of a multi-sept plan (i.e., part of revamping paffinity in general). As part of the larger plan, we had planned to -- at least for a short while -- enable XML support in hwloc. Ralph and I will discuss this; I *think* we should be able to bring in the overall hwloc support without XML.
>>
>> For the future, hwloc is exploring either supporting some other text format that won't have an additional dependency (e.g., JSON), or re-writing its XML support to drop the libxml dependency.
>>
>>
>> On Aug 31, 2011, at 3:05 PM, Jeff Squyres wrote:
>>
>>> WHAT: Move hwloc up to be a first-class citizen in OPAL (while still making it possible to compile it out for platforms that don't need it)
>>>
>>> WHY: I previously sent a similar RFC to this one, but it got shot down in favor of hiding hwloc's functionality under abstraction. After playing with this for some time, we're now firmly in the belief that the additional abstraction doesn't buy OMPI anything.
>>>
>>> WHERE: A new compile-time-one-of-many framework like libevent: opal/mca/hwloc.
>>>
>>> WHEN: as part of the paffinity changes being worked on by Jeff, Josh, Terry, and Ralph.
>>>
>>> TIMEOUT: Teleconf, Tuesday, Sep 6.
>>>
>>> --> Short timeout because I *think* the only person that objected to the prior RFC (Ralph) has now been converted. Hence, I think this will be non-controversial. See below.
>>>
>>> --------------------------------------
>>>
>>> MORE DETAIL:
>>>
>>> There are many people who want to use hwloc within the OMPI code base for many different reasons. We've struggled how to do so for two reasons:
>>>
>>> 1. avoid a complete dependence on hwloc
>>> 2. be able to compile it out for platforms that don't want/need it (e.g., Cray)
>>>
>>> The initial objection to my long-ago RFC was that you could hide hwloc under some abstraction and therefore easily be able to handle compiling hwloc out, supporting platforms that hwloc doesn't support, and potentially be able to replace hwloc with something else someday, if desired.
>>>
>>> After wrestling with this for a good long while, none of those goals seem workable via a thin layer of abstraction.
>>>
>>> Instead, let's just call a spade a spade: we'll be dependent upon hwloc. We'll provide a mechanism to compile it out for Cray and other embedded platforms.
>>>
>>> Here's the plan:
>>>
>>> 1. Make a new framework opal/mca/hwloc. We'll initially have 3 components:
>>> - hwloc121: hwloc distribution v1.2.1
>>> - system: the system-installed hwloc
>>> - none: for platforms that don't want hwloc support
>>>
>>> Just like the libevent framework, we can introduce new versions of hwloc (e.g., 1.3) as new components. Old versions/components can be deleted as new versions/components are stabilized.
>>>
>>> 2. The hwloc framework will be like the libevent framework; only one of these components will be compiled. The component's hwloc API will be directly available (via name-shifting) to the rest of OPAL/ORTE/OMPI. No need for the usual structs of function pointers and whatnot.
>>>
>>> 3. The rest of the OPAL / ORTE / OMPI code base can use the hwloc API in the following way:
>>>
>>> 3a. opal_init() will initialize hwloc and load a central copy of the local machine's topology in a global variable. Anyone in the code base can use this global variable; its use does not need to be protected by #if _whatever_. However, its value may be NULL for platforms that hwloc doesn't support or installations that used the "none" hwloc component.
>>>
>>> 3b. opal_config.h will contain the macro OPAL_HAVE_HWLOC, which will be either 0 or 1. Any code that uses the hwloc API must protect itself with #if OPAL_HAVE_HWLOC, because installations that use the "none" hwloc component won't be able to link resolve any of the hwloc symbols.
>>>
>>> Meaning that you could do something like:
>>>
>>> if (NULL != opal_hwloc_topology) {
>>> #if OPAL_HAVE_HWLOC
>>> // ...use hwloc API, etc.
>>> #endif
>>> }
>>>
>>> 4. After steps 1-3 are all done, the paffinity and maffinity frameworks can be deleted and replaced with the corresponding hwloc calls.
>>>
>>> Meaning: if we've got hwloc, the paffinity and maffinity frameworks now become redundant. So let's whack them. This can happen after 1-3 are done and stable in the trunk, however.
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/