Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Trouble with SGE integration
From: Ondrej Glembek (glembek_at_[hidden])
Date: 2009-12-01 04:00:36


Hi

Reuti wrote:
>>
>> ./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64
>> --with-sge --enable-shared --enable-static --host=x86_64-linux
>> --build=x86_64-linux NM=x86_64-linux-nm
>
> Is there any list of valid values for --host, --build and NM - and what
> is NM for? From the ./configure --help I would "assume" that one can
> tell Open MPI to prepare to BUILD on a PPC platform, although I'm
> issuing the command on a x86, and the result of the PPC compile should
> be to run on x86_64. Maybe you can leave it out, as it's the same in
> your case?

This is not the problem... We have both 32bit and 64bit machines and the
problem occurs on both (i.e. omitting the --host --build, etc)...

>
>> Is there any way to force the ssh before the (...) term???
>
> Using SSH directly would bypass SGE's startup. What are your entries for
> qrsh_daemon and so on in SGE's configuration? Which version of SGE?

qstat reports version number as "GE 6.2u4"... Below is qconf -sconf dump.

>
> But I think the real problem is, that Open MPI assumes you are outside
> of SGE and so uses a different startup. Are you resetting any of SGE's
> environment variables in your custom starter method (like $JOB_ID)?
I don't think that openmpi doesn't know about SGE when it calls the
starter.sh...

The starter.sh looks like this:

$$$
#!/bin/sh

ulimit -S -c 0
ulimit -S -t unlimited

#echo "$@" >>/pub/tmp/starter.log

#start the job in thus shell
exec "$@"

so no resetting of any kind. Also the open_info looks ok:

$$$
ompi_info | grep gridengine
                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.3)

$$$
qconf -sconf:
qconf -sconf
#global:
execd_spool_dir /usr/local/share/SGE/default/spool
mailer /bin/mail
xterm /usr/bin/xterm
load_sensor /usr/local/share/SGE/util/disk.sh
prolog none
epilog none
shell_start_mode posix_compliant
login_shells sh,ksh,csh,tcsh,bash
min_uid 0
min_gid 0
user_lists none
xuser_lists none
projects none
xprojects none
enforce_project false
enforce_user auto
load_report_time 00:00:30
max_unheard 00:05:00
reschedule_unknown 00:00:00
loglevel log_warning
administrator_mail linux_at_[hidden]
set_token_cmd none
pag_cmd none
token_extend_time none
shepherd_cmd none
qmaster_params none
reporting_params accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false
sharelog=00:00:00
finished_jobs 20
gid_range 20000-20100
qlogin_command builtin
qlogin_daemon builtin
rlogin_daemon builtin
max_aj_instances 2000
max_aj_tasks 90000
max_u_jobs 0
max_jobs 0
auto_user_oticket 0
auto_user_fshare 0
auto_user_default_project STD
auto_user_delete_time 0
delegated_file_staging false
rsh_daemon builtin
rsh_command builtin
rlogin_command builtin
reprioritize 0
jsv_url none
jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w

Thanx

>
> -- Reuti
>
>
>>
>> Thanx
>> Ondrej
>>
>>
>> Reuti wrote:
>>> Am 30.11.2009 um 18:46 schrieb Ondrej Glembek:
>>>> Hi, thanx for reply...
>>>>
>>>> I tried to dump the $@ before calling the exec and here it is:
>>>>
>>>>
>>>> ( test ! -r ./.profile || . ./.profile;
>>>> PATH=/homes/kazi/glembek/share/openmpi-1.3.3-64/bin:$PATH ; export
>>>> PATH ;
>>>> LD_LIBRARY_PATH=/homes/kazi/glembek/share/openmpi-1.3.3-64/lib:$LD_LIBRARY_PATH
>>>> ; export LD_LIBRARY_PATH ;
>>>> /homes/kazi/glembek/share/openmpi-1.3.3-64/bin/orted -mca ess env
>>>> -mca orte_ess_jobid 3870359552 -mca orte_ess_vpid 1 -mca
>>>> orte_ess_num_procs 2 --hnp-uri
>>>> "3870359552.0;tcp://147.229.8.134:53727" --mca
>>>> pls_gridengine_verbose 1 --output-filename mpi.log )
>>>>
>>>>
>>>> It looks like the line gets constructed in
>>>> orte/mca/plm/rsh/plm_rsh_module.c and depends on the shell...
>>>>
>>>> Still I wonder, why mpiexec calls the starter.sh... I thought the
>>>> starter was supposed to call the script which wraps a call to
>>>> mpiexec...
>>> Correct. This will happen for the master node of this job, i.e. where
>>> the jobscript is executed. But it will also be used for the qrsh
>>> -inherit calls. I wonder about one thing: I see only a call to
>>> "orted" and not the above sub-shell on my machines. Did you compile
>>> Open MPI with --with-sge?
>>> The original call above would be "ssh node_xy ( test ! ....)" which
>>> seems working for ssh and rsh.
>>> Just one note: with the starter script you will lose the set PATH and
>>> LD_LIBRARY_PATH, as a new shell is created. It might be necessary to
>>> set it again in your starter method.
>>> -- Reuti
>>>>
>>>> Am I not right???
>>>> Ondrej
>>>>
>>>>
>>>> Reuti wrote:
>>>>> Hi,
>>>>> Am 30.11.2009 um 16:33 schrieb Ondrej Glembek:
>>>>>> we are using a custom starter method in our SGE to launch our
>>>>>> jobs... It
>>>>>> looks something like this:
>>>>>>
>>>>>> #!/bin/sh
>>>>>>
>>>>>> # ... we do whole bunch of stuff here
>>>>>>
>>>>>> #start the job in thus shell
>>>>>> exec "$@"
>>>>> the "$@" should be replaced by the path to the jobscript (qsub) or
>>>>> command (qrsh) plus the given options.
>>>>> For the spread tasks to other nodes I get as argument: " orted -mca
>>>>> ess env -mca orte_ess_jobid ...". Also no . ./.profile.
>>>>> So I wonder, where the . ./.profile is coming from. Can you put a
>>>>> `sleep 60` or alike before the `exec ...` and grep the built line
>>>>> from `ps -e f` before it crashes?
>>>>> -- Reuti
>>>>>> The trouble is that mpiexec passes a command which looks like this:
>>>>>>
>>>>>> ( . ./.profile ..... )
>>>>>>
>>>>>> which, however, is not a valid exec argument...
>>>>>>
>>>>>> Is there any way to tell mpiexec to run it in a separate script???
>>>>>> Any
>>>>>> idea how to solve this???
>>>>>>
>>>>>> Thanx
>>>>>> Ondrej Glembek
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
>>>>>> UPGM FIT VUT Brno, L226 Web:
>>>>>> http://www.fit.vutbr.cz/~glembek
>>>>>> Bozetechova 2, 612 66 Phone: +420 54114-1292
>>>>>> Brno, Czech Republic Fax: +420 54114-1290
>>>>>>
>>>>>> ICQ: 93233896
>>>>>> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> --
>>>>
>>>> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
>>>> UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/~glembek
>>>> Bozetechova 2, 612 66 Phone: +420 54114-1292
>>>> Brno, Czech Republic Fax: +420 54114-1290
>>>>
>>>> ICQ: 93233896
>>>> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>>
>> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
>> UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/~glembek
>> Bozetechova 2, 612 66 Phone: +420 54114-1292
>> Brno, Czech Republic Fax: +420 54114-1290
>>
>> ICQ: 93233896
>> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
  Ondrej Glembek, PhD student  E-mail: glembek_at_[hidden]
  UPGM FIT VUT Brno, L226      Web:    http://www.fit.vutbr.cz/~glembek
  Bozetechova 2, 612 66        Phone:  +420 54114-1292
  Brno, Czech Republic         Fax:    +420 54114-1290
  ICQ: 93233896
  GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C