> There is an MCA param that tells the orted to set its usage limits to the hard limit:
> MCA opal: parameter "opal_set_max_sys_limits" (current value:<0>, data source: default value)
> Set to non-zero to automatically set any system-imposed limits to the maximum allowed
> The orted could be used to set the soft limit down from that value on a per-job basis, but we didn't provide a mechanism for specifying it. Would be relatively easy to do, though.
> What version are you using? If I create a patch, would you be willing to test it?
1.4.2, with 1.4.1 available, and 1.4.3 waiting in the wings.
I would love to test any patch you could come up with.
The ability to set any valid limit to any valid value,
applied equally to all processes, would go a long way in
making our environment more stable. Thanks!
>> We would like to set process memory limits (vmemoryuse, in csh
>> terms) on remote processes. Our batch system is torque/moab.
>> The nodes of our cluster each have 24GB of physical memory, of
>> which 4GB is taken up by the kernel and the root file system.
>> Note that these are diskless nodes, so no swap either.
>> We can globally set the per-process limit to 2.5GB. This works
>> fine if applications run "packed": 8 MPI tasks running on each
>> 8-core node, for an aggregate limit of 20GB. However, if a job
>> only wants to run 4 tasks, the soft limit can safely be raised
>> to 5GB. 2 tasks, 10GB. 1 task, the full 20GB.
>> Upping the soft limit in the batch script itself only affects
>> the "head node" of the job. Since limits are not part of the
>> "environment", I can find no way propagate them to remote nodes.
>> If I understand how this all works, the remote processes are
>> started by orted, and therefore inherit its limits. Is there
>> any sort of orted configuration that can help here? Any other
>> thoughts about how to approach this?
>> Best regards,
>> David Turner
>> User Services Group email: dpturner_at_[hidden]
>> NERSC Division phone: (510) 486-4027
>> Lawrence Berkeley Lab fax: (510) 486-4316
>> users mailing list
> users mailing list
User Services Group email: dpturner_at_[hidden]
NERSC Division phone: (510) 486-4027
Lawrence Berkeley Lab fax: (510) 486-4316