Jeff Squyres wrote:
Just chatted with Ralph about this on the phone and he came up with a slightly better compromise...

He points out that we really don't need *all* of the hwloc API (there's a bajillion tiny little accessor functions).  We could provide a steady, OPAL/ORTE/OMPI-specific API (probably down in opal/util or somesuch) with a dozen or two (or whatever) functions that we really need.  These functions can either call their back-end hwloc counterparts or they could do something safe if hwloc is not present / not supported / etc.

That would alleviate the need to put #if OPAL_HAVE_HWLOC elsewhere in the code base.  But the code calling opal_hwloc_<foo>() needs to be able to gracefully handle the failure case where it returns OPAL_ERR_NOT_SUPPORTED (etc.).

The above sounds like you are replacing the whole paffinity framework with hwloc.  Is that true?  Or is the hwloc accessors you are talking about non-paffinity related?

On May 17, 2010, at 8:25 PM, Jeff Squyres (jsquyres) wrote:

On May 17, 2010, at 7:59 PM, Barrett, Brian W wrote:

HWLOC could be extended to support Red Storm, probably, but we don't have the need or time to do such an implementation. 
Fair enough.

Given that, I'm not really picky about what the method of not breaking an existing supported platform is, but I think having HAVE_HWLOC defines everywhere is a bad idea...
We need a mechanism to have hwloc *not* be there, particularly for embedded environments -- where hwloc would add no value.  This is apparently just like Red Storm, but even worse because we need to keep the memory footprint down as much as possible ( on linux is 104KB -- libhwloc.a is 139KB -- both are big numbers when you only have a few MB of usable RAM).  So even leaving stubs doesn't seem like a good idea -- they'll take up space, too.  And the hwloc API is fairly large -- maintaining stubs for all the API functions could be a daunting task.

I think embedding is the main reason I can't think of any better idea than #if OPAL_HAVE_HWLOC.

I anticipate that hwloc usage would be fairly localized in the OMPI code base:

int btl_sm_setup_stuff(...)
#if OPAL_HAVE_HWLOC interesting hwloc things...
     ...setup stuff on btl_sm_component...
     btl_sm_component.have_hwloc = 1;
     btl_sm_component.have_hwloc = 0;

int btl_sm_other_stuff(...)
    if (btl_sm_component.have_hwloc) {
        ...use the hwloc info...

But I'm certainly open to other ideas -- got any?

Jeff Squyres
For corporate legal information go to:

devel mailing list



Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803