I think Linux cgroups makes the most sense in terms of a mechanism for doing this. We don't already do it, but it is something our customers want to see in the platform - so we have to provide it.
The basic use-case is for an application to specify a max memory requirement, thus allowing us to subdivide the node when allocating resources. In that case, we need to ensure that the application remains within that memory limit so we don't start swapping. This is a typical "big data" requirement, and the apps know how to handle the situation where they run up against the limit (e.g., what to do when malloc returns NULL).
System resource managers don't usually provide this capability, so we will do it at the ORTE level. We already use hwloc there for resource discovery and process placement, so it seems natural to include the ability to specify limits. Since ORTE also does the process launching, it could do the final cgroup definition and pass it to Linux.
We envision an API that basically is modeled after the cgroup structure. What we would want hwloc to do is the final step - we pass in the resource constraints, including bind and memory policy specs, and hwloc does the "magic" to tell Linux what needs to be done.
On Nov 2, 2012, at 2:18 PM, Brice Goglin <Brice.Goglin_at_[hidden]> wrote:
> Hello Ralph,
> I am not very familiar with these features. What system mechanism do you
> currently use for this? Linux cgroups? Any concrete example of what you
> would like to do?
> Le 02/11/2012 22:12, Ralph Castain a écrit :
>> Hi folks
>> We (Greenplum) have a need to support resource limits (e.g., memory and cpu usage) on processes running under Open MPI's RTE. OMPI uses hwloc for processor and memory affinity, so this seems a likely place to add the required support. Jeff tells me that it doesn't yet exist in hwloc - I'm wondering if you would welcome and/or be willing to consider contributions from our engineers towards adding this capability?
>> Obviously, we'd need to discuss how and where to do the extension. Just wanted to first see if this is an option, or if we should do it directly in OMPI.
>> hwloc-devel mailing list