Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+sge
From: Reuti (reuti_at_[hidden])
Date: 2008-10-02 11:18:26


Am 02.10.2008 um 16:51 schrieb Jaime Perea:

> Hi
>
> builtin, do I have to change them to ssh and sshd as in sge 6.1?

I always used only rsh, as ssh doesn't provide a Tight Integration
with correct accounting (unless you compiled SGE with -tigth-ssh on
your own).

But it would be worth a try with either the rsh or ssh stuff, as the
builtin starter is a new feature of SGE 6.2.

-- Reuti

>
> Thanks again
>
> --
> Jaime Perea
>
>
> El Jueves, 2 de Octubre de 2008, Reuti escribió:
>> Am 02.10.2008 um 16:12 schrieb Jaime Perea:
>>> Hi again, thanks for the answer
>>>
>>> Actually I took the definition of the pe from the openmpi
>>> webpage, in my case
>>>
>>> qconf -sp orte
>>> pe_name orte
>>> slots 24
>>> user_lists NONE
>>> xuser_lists NONE
>>> start_proc_args /bin/true
>>> stop_proc_args /bin/true
>>> allocation_rule $round_robin
>>> control_slaves TRUE
>>> job_is_first_task TRUE
>>> urgency_slots min
>>> accounting_summary FALSE
>>>
>>> Our sge is version 6.2 and openmpi was configured with
>>> the --with-sge switch of course.
>>
>> In SGE 6.2 two types of remote startup are implemented. Which one are
>> you using (builtin or the former settings for each command) in the
>> SGE configuration?
>>
>> -- Reuti
>>
>>> Regards
>>>
>>> --
>>> Jaime Perea
>>>
>>> El Jueves, 2 de Octubre de 2008, Reuti escribió:
>>>> Hi,
>>>>
>>>> Am 02.10.2008 um 15:37 schrieb Jaime Perea:
>>>>> Hello,
>>>>>
>>>>> I am having some problems with a combination of openmpi+sge6.2
>>>>>
>>>>> Currently I'm working with the 1.3a1r19666 openmpi release and the
>>>>
>>>> AFAIK, you have to enable SGE support in Open MPI 1.3 during its
>>>> compilation.
>>>>
>>>>> myrinet gm libraries (2.1.19) but the problem was the same
>>>>> with the
>>>>> prior 1.3 version. In short, I'm able to send jobs to a que via
>>>>> qrsh,
>>>>> more or less this way,
>>>>>
>>>>> qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming
>>>>
>>>> It should also work without specifying the number of slots a second
>>>> time, i.e.:
>>>>
>>>> qrsh -cwd -V -q para -pe orte 6 mpirun ctiming
>>>>
>>>>> ctiming is a small test program and in this way it works, but if I
>>>>> try to
>>>>> send the same task by using qsub on a script like this one
>>>>>
>>>>> #!/bin/sh
>>>>> #$ -pe orte 6
>>>>
>>>> This PE has just /bin/true for start-/stop_proc_args?
>>>>
>>>>> #$ -q para
>>>>> #$ -cwd
>>>>> #
>>>>> mpirun -np $NSLOTS /model/jaime/ctiming
>>>>
>>>> mpirun /model/jaime/ctiming
>>>>
>>>>> It fails with a message like this,
>>>>> ..............
>>>>>
>>>>> error reading job context from "qlogin_starter"
>>>>
>>>> qlogin_starter should of course only be started with a qlogin
>>>> command
>>>> in SGE.
>>>>
>>>>> ------------------------------------------------------------------
>>>>> --
>>>>> --
>>>>> ----
>>>>> A daemon (pid 11207) died unexpectedly with status 1 while
>>>>> attempting
>>>>> to launch so we are aborting.
>>>>>
>>>>> There may be more information reported by the environment (see
>>>>> above).
>>>>>
>>>>> This may be because the daemon was unable to find all the needed
>>>>> shared
>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>> have the
>>>>> location of the shared libraries on the remote nodes and this will
>>>>> automatically be forwarded to the remote nodes.
>>>>>
>>>>> .............
>>>>>
>>>>> I know that LD_LIBRARY_PATH is not the problem, since I checked
>>>>> that all
>>>>> the environment is present.... any idea?
>>>>>
>>>>> For previous releases of the sge and openmpi I was able to do them
>>>>> work
>>>>> together with a few wrappers,
>>>>
>>>> Which version of SGE are you using?
>>>>
>>>> -- Reuti
>>>>
>>>>> but now the integration looks much better!
>>>>> This happen only when sending openmpi jobs.
>>>>>
>>>>> Thanks and all the best
>>>>>
>>>>> ---
>>>>>
>>>>> Jaime D. Perea Duarte. <jaime at iaa dot es>
>>>>> Linux registered user #10472
>>>>>
>>>>> Dep. Astrofisica Extragalactica.
>>>>> Instituto de Astrofisica de Andalucia (CSIC)
>>>>> Apdo. 3004, 18080 Granada, Spain.
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>