Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Some practical hwloc API feedback
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-09-22 16:25:08


Le 22/09/2011 21:36, Jeff Squyres a écrit :
> 1. The depth-specific accessors are Bad. Given the warning language in the docs paired with the practical realities that some people actually do mix and match CPUs in a single server (especially when testing new chips), the depth-based accessors *can/will* fail. Meaning: you have to write application code that can handle the non-uniform depth cases, making the depth-based accessors essentially useless.

I don't see any real problem with having depth accessors and mixed types
of CPUs in a server. You can have different levels of caches in
different CPUs,, but you still have a uniform depth/level for important
things like PUs, Core, Socket.

The only problem so far is caches. But do you actually walk the list of
caches? People would walk the list of PUs, Cores, Sockets, NUMA nodes.
But when talking about Caches, I would rather see them ask "which cache
do I have above these cores?".

And I don't see how DFS would help. Any concrete example?

> 2. All caches are listed as HWLOC_OBJ_CACHE, regardless of their level. We would like to request changing these to having specific enums for each level of cache -- perhaps adding HWLOC_OBJ_CACHE_L1 through L10 to cover future possible platforms.
>
> The reasons we are asking for this are as follows:
>
> (2a) the depth-based accessors are automatically broken for any machine with more than one level of cache (i.e., they return -1 because caches exist at multiple levels). Yes, #1 expounded on how the depth-based accessors are bad, but I mention this point anyway. :-)
>
> (2b) by the same logic, calling get_nbobjs() on HWLOC_OBJ_CACHE fails.
>
> (2c) to find the set of any given Lx caches, you basically have to traverse the tree looking for HWLOC_OBJ_CACHE *and* attr->cache.depth==x. It would be cleaner if we could just look for HWLOC_OBJ_CACHE_L<whatever>.
>
> (2d) more specifically: since all caches are of type HWLOC_OBJ_CACHE, we find ourselves putting in special case logic for caches all over our code. Ick.
>
> Note: I'm not sure how to add new HWLOC_OBJ_CACHE_Lx types and preserve backwards compatibility. :-\

Long standing problem, yes. Not only about caches unfortunately. Also
about groups, and maybe other one day.

There's a trac ticket about basically having an "extended type" which
would contain the current type + a depth attribute. This guy can be
converted into string, level depth, ...

> 3. It would be really great to have some kind of flag in each object that says whether all of its children are homogeneous or not.
>
> Specifically: if the flag is true, it means that the trees rooted by obj->children[i] are "the same", meaning that each contain the same number of same-typed objects in the same topology layout, and have the same attributes (e.g., their memory sizes are the same, etc.).
>
> Of course, the OS indexes and cpusets will be different between the objects in the different trees. The homogeneous flag does not apply to those kinds of things.
>
> But having this flag means that you might be able to traverse just the obj->children[0] tree and then be able to prune all other DFS searches and extrapolate the discovered results.
>
> We ended up implementing this kind of feature in a struct hanging off obj->userdata; it saved extra compute cycles and some extra logic in some cases.

Ack.

> 4. src/topology-synthetic.c emits error messages on stderr when you try to import invalid XML. I am guessing that this was put there because it's a much more specific error message than simply returning, for example, EINVAL -- the stderr message tells you the XML file line number of the problem, for example.
>
> Could this be done in a different way? I ask because test suites that import synthetic XML to hwloc do not want their stdout/stderr interrupted.
>

Your talking about topology-xml.c right ? topology-synthetic is
something different.

All stderr warnings are gone in trunk (not in v1.3 iirc). The single one
that remains is the one saying "if you need full XML support, use
libxml2". There's an env variable to reenable them.

And we return better error values when failing to parse XML in trunk too.

> 5. The XML dump of the topology doesn't include all the support information, such as whether you can bind to threads/cores/etc. I'm guessing this was done because the emphasis on importing XML was for drawing pretty lstopo pictures.

Come on, the emphasis on importing XML is for remote debugging :)

> But we're using the XML export in OMPI to send the topology of compute nodes up to the scheduler, where decisions are made about how to lay out processes on the back-end compute nodes, what the binding width will be, etc. This front-end scheduler needs to know whether the back-end node is capable of supporting binding, for example.
>
> We manually added this information into the message that we send up to the scheduler, but it would be much nicer if the XML export/import just handled that automatically.

I guess we could add some "support" attributes to the XML.

Does your scheduler actually need to know if binding is supported? What
does it do if not supported? Can't just try to bind and get an error if
not supported?

Brice