Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: sadfub_at_[hidden]
Date: 2007-06-22 10:43:42


Hi Pak,

> Jeff Squyres wrote:
>>>> 2. I know little/nothing about SGE, but I'm assuming that you need to
>>>> have SGE pass the proper memory lock limits to new processes. In an
>>>> interactive login, you showed that the max limit is "8162952" -- you
>>>> might just want to make it unlimited, unless you have a reason for
>>>> limiting it. See http://www.open-mpi.org/faq/?
>>> yes I allready read the faq, and even setting them to unlimited has
>>> shown not be working. In the SGE one could specify the limits to
>>> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
>>> modify > limits) But there is everything set to infinity. (Beside
>>> that,
>>> the job is running with a static machinefile (is this an
>>> "noninteractive" job?)) How could I test ulimits of interactive and
>>> noninteractive jobs?
>> Launch an SGE job that calls the shell command "limit" (if you run C-
>> shell variants) or "ulimit -l" (if you run Bourne shell variants).
>> Ensure that the output is "unlimited".
>>
>> What are the limits of the user that launches the SGE daemons? I.e.,
>> did the SGE daemons get started with proper "unlimited" limits? If
>> not, that could hamper SGE's ability to set the limits that you told
>> it to via qmon (remember my disclaimer: I know nothing about SGE, so
>> this is speculation).
>>
>
> I am assuming you have tried without using SGE (like via ssh or others)
> to launch your job and that works correctly? If yes then you should
> compare the outputs of limit as Jeff suggested to see if they are any
> difference between the two (with and without using SGE).

Yes, without SGE all works, with SGE it does work too if I use a static
machinefile (see initial post), or -H h1,...,hn does work too! Just with
the SGE's generate $TMPDIR/machines file (which in turn is valid! I
checked this), the job doesn't run. And the ulimits are (in every three
possibilities every time) unlimited:

pos1: pdsh -R shh -w node[XX-YY] ulimit -a => unlimited

(loose coupled)
pos2: qsub jobscribt, where jobscript just calls the command as in pos1

(thight coupled?)
pos3: qsub jobscribt, where jobscript calls another script (containing
the same command as in pos1) and additionally passing $TMPDIR/machines
as argument to it.

Thanks for your help.