Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Trouble with SGE integration
From: Reuti (reuti_at_[hidden])
Date: 2009-12-01 05:47:51


Hi,

Am 01.12.2009 um 10:00 schrieb Ondrej Glembek:

> Reuti wrote:
>>>
>>> ./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
>>> --with-sge --enable-shared --enable-static --host=x86_64-linux
>>> --build=x86_64-linux NM=x86_64-linux-nm
>>
>> Is there any list of valid values for --host, --build and NM - and
>> what
>> is NM for? From the ./configure --help I would "assume" that one can
>> tell Open MPI to prepare to BUILD on a PPC platform, although I'm
>> issuing the command on a x86, and the result of the PPC compile
>> should
>> be to run on x86_64. Maybe you can leave it out, as it's the same in
>> your case?
>
> This is not the problem... We have both 32bit and 64bit machines
> and the
> problem occurs on both (i.e. omitting the --host --build, etc)...
>
>>
>>> Is there any way to force the ssh before the (...) term???
>>
>> Using SSH directly would bypass SGE's startup. What are your
>> entries for
>> qrsh_daemon and so on in SGE's configuration? Which version of SGE?
>
> qstat reports version number as "GE 6.2u4"... Below is qconf -sconf
> dump.
>
>>
>> But I think the real problem is, that Open MPI assumes you are
>> outside
>> of SGE and so uses a different startup. Are you resetting any of
>> SGE's
>> environment variables in your custom starter method (like $JOB_ID)?
> I don't think that openmpi doesn't know about SGE when it calls the
> starter.sh...
>
>
> The starter.sh looks like this:
>
> $$$
> #!/bin/sh
>
> ulimit -S -c 0
> ulimit -S -t unlimited

what about setting this in the queue definition (the core size). The
runtime will be limited if you request -l s_rt=... in SGE (or define
a max in the queue definiton) besides h_rt.

> #echo "$@" >>/pub/tmp/starter.log
>
> #start the job in thus shell
> exec "$@"
>
>
> loglevel log_warning

loglevel log_info

will often give more info (not in this case, but in case of some
other issues).

-- Reuti