Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Cgroup resource limits
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-11-03 03:53:01

Le 02/11/2012 23:05, Ralph Castain a écrit :
> Hi Brice
> I think Linux cgroups makes the most sense in terms of a mechanism for doing this. We don't already do it, but it is something our customers want to see in the platform - so we have to provide it.
> The basic use-case is for an application to specify a max memory requirement, thus allowing us to subdivide the node when allocating resources. In that case, we need to ensure that the application remains within that memory limit so we don't start swapping. This is a typical "big data" requirement, and the apps know how to handle the situation where they run up against the limit (e.g., what to do when malloc returns NULL).
> System resource managers don't usually provide this capability, so we will do it at the ORTE level. We already use hwloc there for resource discovery and process placement, so it seems natural to include the ability to specify limits. Since ORTE also does the process launching, it could do the final cgroup definition and pass it to Linux.
> We envision an API that basically is modeled after the cgroup structure. What we would want hwloc to do is the final step - we pass in the resource constraints, including bind and memory policy specs, and hwloc does the "magic" to tell Linux what needs to be done.

I had a quick look at cgroups and here's my feeling:

I see quite a lot of files under the "memory" cgroup virtual fs. If
we're going to support some of those that you need, we might get users
request to support others files and/or types of cgroups (we already
support reading from the cpuset cgroup, btw). And those files may have
different formats of inputs/outputs. That may be a endless pandora box.

One easy solution would just put the minimal thing in hwloc
(setting/getting the list of CPUs, memory nodes and tasks inside a
cgroup) and let applications actually do everything else (read/write
into random files). hwloc could still retrieve the base directory to
help them but the file-specific read/write format would remain in the
application that needs it.

Also do you want to add cgroup information to the topology? There are so
many files in there that it may be hard to decide which ones deserve
being added to the topology.

Note that I couldn't use the memory cgroup yet. For some reason, it
fails to mount here. So I just looked at the cpuset cgroup and at the
memory documentation.

By the way, is there some root access involved in modify cgroup? Or is
it safe to put the directory world-writable so that anybody can manage
its own cgroups?