Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] Cgroup resource limits
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-11-05 21:01:16


On 11/5/12, Christopher Samuel <samuel_at_[hidden]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 06/11/12 01:43, Ralph Castain wrote:
>
>> On Nov 4, 2012, at 7:28 PM, Christopher Samuel
>> <samuel_at_[hidden]> wrote:
>>
>>> I would argue that the resource managers *should* be doing it
>>
>> No argument from me - I would love for them to provide me with an
>> easy API that mpirun can use to specify the requirements for a
>> given application.
>
> Wouldn't it be the other way around with the resource manager setting
> limits and then having the job run inside it? Basically like the
> current cpuset support in Torque, et. al, but on steroids.
>
> That way mpirun and/or orted could learn from the kernel the details
> of the cgroup it is in and arrange itself appropriately.
>
> I believe that Slurm has some support for cgroups already:
>
> http://www.schedmd.com/slurmdocs/cgroups.html

Depends on the use-case. If you are going to direct-launch the
processes (e.g., using srun), then you are correct.

However, that isn't the case in other scenarios. For example, if you
get an allocation and then use mpirun to launch your job, you
definitely do *not* want the RM setting the cgroup constraints as the
RM only launches the orteds - it never sees the MPI procs. The
constraints are to apply to the individual procs as separate entities
- if you apply them to the orteds, then all procs will be constrained
to the same container. Ick.

Similarly, if you are running MapReduce, your application has to
figure out what nodes to run on, how much memory will be required,
etc. All that goes into the allocation request (made by the equivalent
of mpirun in that scenario) sent to the RM. Again, the orteds need to
set those constraints on a per-process basis.

So we need the capability in ORTE to support the non-direct-launch cases.

HTH
Ralph

>
> [memcg performance]
>> Yick! However, I would expect the community to reduce that impact
>> over time. If systems don't want that capability, then they can
>> and should disable it. On the other hand, if they do want it, then
>> we want to support it.
>
> Indeed!
>
> cheers,
> Chris
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iEYEARECAAYFAlCYReYACgkQO2KABBYQAh+BxQCbB1lbNCqotuA2paV+G6+cfAdP
> xxwAnAurUX8OoK1+4oJJJY7NV9cmIoRV
> =yrCv
> -----END PGP SIGNATURE-----
> _______________________________________________
> hwloc-devel mailing list
> hwloc-devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
>