Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] release status
From: Fawzi Mohamed (fawzi_at_[hidden])
Date: 2009-10-05 09:23:23


On 5-ott-09, at 14:27, Jeff Squyres wrote:

> On Oct 3, 2009, at 8:21 AM, Fawzi Mohamed wrote:
>
>> Ok you are right that storing in the struct might be overkill, and
>> about performance I fully agree, space not so much, especially if
>> you really want to cache all the cpuset for all objects, this still
>> grows quadratically, and allocates a lot of objects.
>
> I'm still not sure that I agree -- I still think we're just
> quibbling over a few bytes here. It's commonplace to have 2GB RAM
> per core these days; that number certainly isn't going to go down --
> it's likely that it's even going to go up.
>
> So yes, if every process running on every core has a cpuset, you
> multiply (for example) a 4k cpuset data structure times 1,000
> processors (cores): 4MB. But consider that each of those 1,000
> processors will have 2GB or more of RAM. That's 2 terabytes; who
> cares about 4MB when you have 2TB? That's 6 orders of magnitude
> difference; put differently, 4MB is 0.0002 percent of 2TB.

well you assume you have a single copy of the whole system structure,
I am not sure that would be the case, and while the memory per core is
growing, the memory per thread is not growing much,... but anyway that
is not the important point...

> I agree that we shouldn't be wasteful, but the difference we're
> talking about here is in the noise.

ok

>> That was the reason I was advocating having a function returning
>> the cpuset from an object (sparse cpuset would also be a solution).
>>
>> Anyway the real issue here is the API I think.
>> I would say that the best solution is
>> - keep cpuset a structure (not just void*), so it can be just a
>> void* or something more complex in the future without API changes
>
> I'm not sure I parsed the above sentence properly -- I read it as
> advocating 2 different things. Can you explain?

yes you are right, I was unclear, I meant that I would pass a cpu_set
struct by value (not always pass a pointer).
If one wants to later migrate to passing just a pointer, then
internally this struct can have just a single pointer as field.

>> - add functions to allocate/deallocate/copy it, and make it clear
>> that these should be called on the cpusets returned by other
>> functions (i.e. clarify ownership transfers).
>
> Such functions would be necessary only if there are non-public
> members of the struct or if you want to deep copy the struct,
> right? They would also apply if we return opaque handles, not
> public structures.

indeed, if you alloc, and it is fixed size (no sparse structure) then
one can just call free, but in general having a structure specific
free function gives just a lot more flexibility for the future (and is
needed to copy unknown size structs).

>> - functions that are possibly inlined are ok (obviously changing
>> them breaks the binary compatibility), but recompilation fixes
>> them, and other languages can still use the non inline function
>> that is part of the lib
>
> The usual reason for inlining is a need for performance -- and I
> honestly think that we don't need it. So if the usual question for
> inlining is "why not?", I turn that question around and ask "if not
> for performance, why?". :-)

ok with me :)

>> - macros I don't like, they make binding to other languages more
>> difficult, as one has to write either a thin glue layer, or
>> duplicate the macro, which will not stay in sync with lib changes
>> automatically (cpuset has some macros, but the structure is so
>> simply that I just used another bit compatible type when binding to
>> D).
>
> Agreed. Macros = evil; should only be used where absolutely
> necessary.
>
>> To make the release quickly I think that just adding the requested
>> functions (alloc/dealloc would be noops at the moment) would be good.
>> Then in the future one can switch to dynamic or sparse cpuset
>> without user visible changes (apart recompilation).
>
>
> Agreed; that is a good goal (switch to a new back-end type without
> needing to change user code).

yes, and I think that was the reason behind the initial question by
Samuel on dynamic cpuset_t

Fawzi