Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] memory limits on remote nodes
From: Reuti (reuti_at_[hidden])
Date: 2010-10-08 04:45:32


Am 08.10.2010 um 00:40 schrieb Ralph Castain:

>
> On Oct 7, 2010, at 2:55 AM, Reuti wrote:
>
>> Am 07.10.2010 um 01:55 schrieb David Turner:
>>
>>> Hi,
>>>
>>> We would like to set process memory limits (vmemoryuse, in csh
>>> terms) on remote processes. Our batch system is torque/moab.
>>
>> Isn't it possible to set this up in torque/moab directly? In SGE I would simply define h_vmem and it's per slot then; and with a tight integration all Open MPI processes will be children of sge_execd and the limit will be enforced.
>
> I could be wrong, but I -think- the issue here is that the soft limits need to be set on a per-job basis.

This I also thought, and `qsub -l h_vmem=4G ...` should do it. It can be requested on a per job basis (with further limits on a queue level if necessary).

-- Reuti

>>
>> -- Reuti
>>
>>
>>> The nodes of our cluster each have 24GB of physical memory, of
>>> which 4GB is taken up by the kernel and the root file system.
>>> Note that these are diskless nodes, so no swap either.
>>>
>>> We can globally set the per-process limit to 2.5GB. This works
>>> fine if applications run "packed": 8 MPI tasks running on each
>>> 8-core node, for an aggregate limit of 20GB. However, if a job
>>> only wants to run 4 tasks, the soft limit can safely be raised
>>> to 5GB. 2 tasks, 10GB. 1 task, the full 20GB.
>>>
>>> Upping the soft limit in the batch script itself only affects
>>> the "head node" of the job. Since limits are not part of the
>>> "environment", I can find no way propagate them to remote nodes.
>>>
>>> If I understand how this all works, the remote processes are
>>> started by orted, and therefore inherit its limits. Is there
>>> any sort of orted configuration that can help here? Any other
>>> thoughts about how to approach this?
>>>
>>> Thanks!
>>>
>>> --
>>> Best regards,
>>>
>>> David Turner
>>> User Services Group email: dpturner_at_[hidden]
>>> NERSC Division phone: (510) 486-4027
>>> Lawrence Berkeley Lab fax: (510) 486-4316
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users