Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2007-06-22 10:00:17


Jeff Squyres wrote:
>>> 2. I know little/nothing about SGE, but I'm assuming that you need to
>>> have SGE pass the proper memory lock limits to new processes. In an
>>> interactive login, you showed that the max limit is "8162952" -- you
>>> might just want to make it unlimited, unless you have a reason for
>>> limiting it. See http://www.open-mpi.org/faq/?
>> yes I allready read the faq, and even setting them to unlimited has
>> shown not be working. In the SGE one could specify the limits to
>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
>> modify > limits) But there is everything set to infinity. (Beside
>> that,
>> the job is running with a static machinefile (is this an
>> "noninteractive" job?)) How could I test ulimits of interactive and
>> noninteractive jobs?
>
> Launch an SGE job that calls the shell command "limit" (if you run C-
> shell variants) or "ulimit -l" (if you run Bourne shell variants).
> Ensure that the output is "unlimited".
>
> What are the limits of the user that launches the SGE daemons? I.e.,
> did the SGE daemons get started with proper "unlimited" limits? If
> not, that could hamper SGE's ability to set the limits that you told
> it to via qmon (remember my disclaimer: I know nothing about SGE, so
> this is speculation).
>

I am assuming you have tried without using SGE (like via ssh or others)
to launch your job and that works correctly? If yes then you should
compare the outputs of limit as Jeff suggested to see if they are any
difference between the two (with and without using SGE).

I know of a similar problem with SGE's limitation that it cannot set the
file descriptor limit for the user processes (and I believe the SGE
folks are aware of the problem.) The workaround was to put the setting
into the ~/.tcshrc. So if SGE is not setting other resource limit
correctly or doesn't provide the option, you may have to workaround into
the ~/.tcshrc or simliar settings file for your shell. Otherwise it'll
probably fall back to use the system default.

-- 
- Pak Lui
pak.lui_at_[hidden]