, I think it might be worthwhile to keep
something size+pointer so that if the size is small say less than
size_t the cpuset is stored where otherwise there is the pointer... or
something like that.

Indeed I would keep a minimal struct...

Especially with a large number of OS processor IDs, won't the size of the array dwarf that of the struct?  I think we're quibbling over just a few bytes here in an area where performance and space really aren't all that important...

Ok you are right that storing in the struct might be overkill, and about performance I fully agree, space not so much, especially if you really want to cache all the cpuset for all objects, this still grows quadratically, and allocates a lot of objects. That was the reason I was advocating having a function returning the cpuset from an object (sparse cpuset would also be a solution).

Anyway the real issue here is the API I think.
I would say that the best solution is
- keep cpuset a structure (not just void*), so it can be just a void* or something more complex in the future without API changes
- add functions to allocate/deallocate/copy it, and make it clear that these should be called on the cpusets returned by other functions (i.e. clarify ownership transfers).
- functions that are possibly inlined are ok (obviously changing them breaks the binary compatibility), but recompilation fixes them, and other languages can still use the non inline function that is part of the lib
- macros I don't like, they make binding to other languages more difficult, as one has to write either a thin glue layer, or duplicate the macro, which will not stay in sync with lib changes automatically (cpuset has some macros, but the structure is so simply that I just used another bit compatible type when binding to D).

To make the release quickly I think that just adding the requested functions (alloc/dealloc would be noops at the moment) would be good.
Then in the future one can switch to dynamic or sparse cpuset without user visible changes (apart recompilation).