-----BEGIN PGP SIGNED MESSAGE-----
On 12/08/13 06:17, Ralph Castain wrote:
> 1. Slurm has no direct knowledge or visibility into the
> application procs themselves when launched by mpirun. Slurm only
> sees the ORTE daemons. I'm sure that Slurm rolls up all the
> resources used by those daemons and their children, so the totals
> should include them
> 2. Since all Slurm can do is roll everything up, the resources
> shown in sacct will include those used by the daemons and mpirun as
> well as the application procs. Slurm doesn't include their daemons
> or the slurmctld in their accounting. so the two numbers will be
> significantly different. If you are attempting to limit overall
> resource usage, you may need to leave some slack for the daemons
> and mpirun.
Thanks for that explanation, makes a lot of sense.
In the end due to time pressure we decided to just do what we did with
Torque and patch Slurm to set RLIMIT_AS instead of RLIMIT_DATA for
jobs so no single sub-process can request more RAM than the job has
Works nicely and our users are used to it from Torque, we've not hit
any issues with it so far.
In the long term I suspect the jobacct_gather/cgroup plugin will give
better numbers once it's had more work.
All the best,
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----