This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
-----BEGIN PGP SIGNED MESSAGE-----
On 12/08/13 06:17, Ralph Castain wrote:
> 1. Slurm has no direct knowledge or visibility into the
> application procs themselves when launched by mpirun. Slurm only
> sees the ORTE daemons. I'm sure that Slurm rolls up all the
> resources used by those daemons and their children, so the totals
> should include them
> 2. Since all Slurm can do is roll everything up, the resources
> shown in sacct will include those used by the daemons and mpirun as
> well as the application procs. Slurm doesn't include their daemons
> or the slurmctld in their accounting. so the two numbers will be
> significantly different. If you are attempting to limit overall
> resource usage, you may need to leave some slack for the daemons
> and mpirun.
Thanks for that explanation, makes a lot of sense.
In the end due to time pressure we decided to just do what we did with
Torque and patch Slurm to set RLIMIT_AS instead of RLIMIT_DATA for
jobs so no single sub-process can request more RAM than the job has
Works nicely and our users are used to it from Torque, we've not hit
any issues with it so far.
In the long term I suspect the jobacct_gather/cgroup plugin will give
better numbers once it's had more work.
All the best,
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----