Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] release status
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-10-05 08:27:28


On Oct 3, 2009, at 8:21 AM, Fawzi Mohamed wrote:

> Ok you are right that storing in the struct might be overkill, and
> about performance I fully agree, space not so much, especially if
> you really want to cache all the cpuset for all objects, this still
> grows quadratically, and allocates a lot of objects.

I'm still not sure that I agree -- I still think we're just quibbling
over a few bytes here. It's commonplace to have 2GB RAM per core
these days; that number certainly isn't going to go down -- it's
likely that it's even going to go up.

So yes, if every process running on every core has a cpuset, you
multiply (for example) a 4k cpuset data structure times 1,000
processors (cores): 4MB. But consider that each of those 1,000
processors will have 2GB or more of RAM. That's 2 terabytes; who
cares about 4MB when you have 2TB? That's 6 orders of magnitude
difference; put differently, 4MB is 0.0002 percent of 2TB.

I agree that we shouldn't be wasteful, but the difference we're
talking about here is in the noise.

> That was the reason I was advocating having a function returning the
> cpuset from an object (sparse cpuset would also be a solution).
>
> Anyway the real issue here is the API I think.
> I would say that the best solution is
> - keep cpuset a structure (not just void*), so it can be just a
> void* or something more complex in the future without API changes

I'm not sure I parsed the above sentence properly -- I read it as
advocating 2 different things. Can you explain?

> - add functions to allocate/deallocate/copy it, and make it clear
> that these should be called on the cpusets returned by other
> functions (i.e. clarify ownership transfers).

Such functions would be necessary only if there are non-public members
of the struct or if you want to deep copy the struct, right? They
would also apply if we return opaque handles, not public structures.

> - functions that are possibly inlined are ok (obviously changing
> them breaks the binary compatibility), but recompilation fixes them,
> and other languages can still use the non inline function that is
> part of the lib

The usual reason for inlining is a need for performance -- and I
honestly think that we don't need it. So if the usual question for
inlining is "why not?", I turn that question around and ask "if not
for performance, why?". :-)

> - macros I don't like, they make binding to other languages more
> difficult, as one has to write either a thin glue layer, or
> duplicate the macro, which will not stay in sync with lib changes
> automatically (cpuset has some macros, but the structure is so
> simply that I just used another bit compatible type when binding to
> D).

Agreed. Macros = evil; should only be used where absolutely necessary.

> To make the release quickly I think that just adding the requested
> functions (alloc/dealloc would be noops at the moment) would be good.
> Then in the future one can switch to dynamic or sparse cpuset
> without user visible changes (apart recompilation).

Agreed; that is a good goal (switch to a new back-end type without
needing to change user code).

-- 
Jeff Squyres
jsquyres_at_[hidden]