Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+sge
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2008-10-02 11:35:42


On 10/02/08 11:18, Reuti wrote:
> Am 02.10.2008 um 16:51 schrieb Jaime Perea:
>
>> Hi
>>
>> builtin, do I have to change them to ssh and sshd as in sge 6.1?
>
> I always used only rsh, as ssh doesn't provide a Tight Integration with
> correct accounting (unless you compiled SGE with -tigth-ssh on your own).
>
> But it would be worth a try with either the rsh or ssh stuff, as the
> builtin starter is a new feature of SGE 6.2.
>
> -- Reuti

As was mentioned, SGE 6.2 has a new Integrated Job Starter so that rsh
and ssh do not need to be used to start jobs on remote nodes. This is
the recommended way of starting as it is faster than ssh and more
scalable than rsh. And, you do not need to do any hacks for proper job
accounting like was needed for ssh.

Under the covers, Open MPI uses qrsh to start the MPI jobs on all the
nodes.

Not sure if that helps, but just wanted to mention that information.

Rolf

>
>
>>
>> Thanks again
>>
>> --
>> Jaime Perea
>>
>>
>> El Jueves, 2 de Octubre de 2008, Reuti escribió:
>>> Am 02.10.2008 um 16:12 schrieb Jaime Perea:
>>>> Hi again, thanks for the answer
>>>>
>>>> Actually I took the definition of the pe from the openmpi
>>>> webpage, in my case
>>>>
>>>> qconf -sp orte
>>>> pe_name orte
>>>> slots 24
>>>> user_lists NONE
>>>> xuser_lists NONE
>>>> start_proc_args /bin/true
>>>> stop_proc_args /bin/true
>>>> allocation_rule $round_robin
>>>> control_slaves TRUE
>>>> job_is_first_task TRUE
>>>> urgency_slots min
>>>> accounting_summary FALSE
>>>>
>>>> Our sge is version 6.2 and openmpi was configured with
>>>> the --with-sge switch of course.
>>>
>>> In SGE 6.2 two types of remote startup are implemented. Which one are
>>> you using (builtin or the former settings for each command) in the
>>> SGE configuration?
>>>
>>> -- Reuti
>>>
>>>> Regards
>>>>
>>>> --
>>>> Jaime Perea
>>>>
>>>> El Jueves, 2 de Octubre de 2008, Reuti escribió:
>>>>> Hi,
>>>>>
>>>>> Am 02.10.2008 um 15:37 schrieb Jaime Perea:
>>>>>> Hello,
>>>>>>
>>>>>> I am having some problems with a combination of openmpi+sge6.2
>>>>>>
>>>>>> Currently I'm working with the 1.3a1r19666 openmpi release and the
>>>>>
>>>>> AFAIK, you have to enable SGE support in Open MPI 1.3 during its
>>>>> compilation.
>>>>>
>>>>>> myrinet gm libraries (2.1.19) but the problem was the same with the
>>>>>> prior 1.3 version. In short, I'm able to send jobs to a que via
>>>>>> qrsh,
>>>>>> more or less this way,
>>>>>>
>>>>>> qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming
>>>>>
>>>>> It should also work without specifying the number of slots a second
>>>>> time, i.e.:
>>>>>
>>>>> qrsh -cwd -V -q para -pe orte 6 mpirun ctiming
>>>>>
>>>>>> ctiming is a small test program and in this way it works, but if I
>>>>>> try to
>>>>>> send the same task by using qsub on a script like this one
>>>>>>
>>>>>> #!/bin/sh
>>>>>> #$ -pe orte 6
>>>>>
>>>>> This PE has just /bin/true for start-/stop_proc_args?
>>>>>
>>>>>> #$ -q para
>>>>>> #$ -cwd
>>>>>> #
>>>>>> mpirun -np $NSLOTS /model/jaime/ctiming
>>>>>
>>>>> mpirun /model/jaime/ctiming
>>>>>
>>>>>> It fails with a message like this,
>>>>>> ..............
>>>>>>
>>>>>> error reading job context from "qlogin_starter"
>>>>>
>>>>> qlogin_starter should of course only be started with a qlogin command
>>>>> in SGE.
>>>>>
>>>>>> --------------------------------------------------------------------
>>>>>> --
>>>>>> ----
>>>>>> A daemon (pid 11207) died unexpectedly with status 1 while
>>>>>> attempting
>>>>>> to launch so we are aborting.
>>>>>>
>>>>>> There may be more information reported by the environment (see
>>>>>> above).
>>>>>>
>>>>>> This may be because the daemon was unable to find all the needed
>>>>>> shared
>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>> have the
>>>>>> location of the shared libraries on the remote nodes and this will
>>>>>> automatically be forwarded to the remote nodes.
>>>>>>
>>>>>> .............
>>>>>>
>>>>>> I know that LD_LIBRARY_PATH is not the problem, since I checked
>>>>>> that all
>>>>>> the environment is present.... any idea?
>>>>>>
>>>>>> For previous releases of the sge and openmpi I was able to do them
>>>>>> work
>>>>>> together with a few wrappers,
>>>>>
>>>>> Which version of SGE are you using?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>> but now the integration looks much better!
>>>>>> This happen only when sending openmpi jobs.
>>>>>>
>>>>>> Thanks and all the best
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Jaime D. Perea Duarte. <jaime at iaa dot es>
>>>>>> Linux registered user #10472
>>>>>>
>>>>>> Dep. Astrofisica Extragalactica.
>>>>>> Instituto de Astrofisica de Andalucia (CSIC)
>>>>>> Apdo. 3004, 18080 Granada, Spain.
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================