Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+sge
From: Jaime Perea (jaime.perea_at_[hidden])
Date: 2008-10-02 11:25:40


Hi

Well, let's try, I downloaded binaries for the sge,

I was thinking on rsh, I'm going to try it after
the old ssh/sshd settings and before than trying
to compile the sge... which I guess is not an easy
task.

Regards

--
Jaime Perea
El Jueves, 2 de Octubre de 2008, Reuti escribió:
> Am 02.10.2008 um 16:51 schrieb Jaime Perea:
> > Hi
> >
> > builtin, do I have to change them to ssh and sshd as in sge 6.1?
>
> I always used only rsh, as ssh doesn't provide a Tight Integration
> with correct accounting (unless you compiled SGE with -tigth-ssh on
> your own).
>
> But it would be worth a try with either the rsh or ssh stuff, as the
> builtin starter is a new feature of SGE 6.2.
>
> -- Reuti
>
> > Thanks again
> >
> > --
> > Jaime Perea
> >
> > El Jueves, 2 de Octubre de 2008, Reuti escribió:
> >> Am 02.10.2008 um 16:12 schrieb Jaime Perea:
> >>> Hi again, thanks for the answer
> >>>
> >>> Actually I took the definition of the pe from the openmpi
> >>> webpage, in my case
> >>>
> >>> qconf -sp orte
> >>> pe_name            orte
> >>> slots              24
> >>> user_lists         NONE
> >>> xuser_lists        NONE
> >>> start_proc_args    /bin/true
> >>> stop_proc_args     /bin/true
> >>> allocation_rule    $round_robin
> >>> control_slaves     TRUE
> >>> job_is_first_task  TRUE
> >>> urgency_slots      min
> >>> accounting_summary FALSE
> >>>
> >>> Our sge is version 6.2 and openmpi was configured with
> >>> the --with-sge switch of course.
> >>
> >> In SGE 6.2 two types of remote startup are implemented. Which one are
> >> you using (builtin or the former settings for each command) in the
> >> SGE configuration?
> >>
> >> -- Reuti
> >>
> >>> Regards
> >>>
> >>> --
> >>> Jaime Perea
> >>>
> >>> El Jueves, 2 de Octubre de 2008, Reuti escribió:
> >>>> Hi,
> >>>>
> >>>> Am 02.10.2008 um 15:37 schrieb Jaime Perea:
> >>>>> Hello,
> >>>>>
> >>>>> I am having some problems with a combination of openmpi+sge6.2
> >>>>>
> >>>>> Currently I'm working with the 1.3a1r19666 openmpi release and the
> >>>>
> >>>> AFAIK, you have to enable SGE support in Open MPI 1.3 during its
> >>>> compilation.
> >>>>
> >>>>> myrinet gm libraries (2.1.19)  but the problem was the same
> >>>>> with the
> >>>>> prior 1.3 version. In short, I'm able to send jobs to a que via
> >>>>> qrsh,
> >>>>> more or less this way,
> >>>>>
> >>>>> qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming
> >>>>
> >>>> It should also work without specifying the number of slots a second
> >>>> time, i.e.:
> >>>>
> >>>> qrsh -cwd -V -q para -pe orte 6 mpirun ctiming
> >>>>
> >>>>> ctiming is a small test program and in this way it works, but if I
> >>>>> try to
> >>>>> send the same task by using qsub on a script like this one
> >>>>>
> >>>>> #!/bin/sh
> >>>>> #$ -pe orte 6
> >>>>
> >>>> This PE has just /bin/true for start-/stop_proc_args?
> >>>>
> >>>>> #$ -q para
> >>>>> #$ -cwd
> >>>>> #
> >>>>> mpirun -np $NSLOTS  /model/jaime/ctiming
> >>>>
> >>>> mpirun /model/jaime/ctiming
> >>>>
> >>>>> It fails with a message like this,
> >>>>> ..............
> >>>>>
> >>>>> error reading job context from "qlogin_starter"
> >>>>
> >>>> qlogin_starter should of course only be started with a qlogin
> >>>> command
> >>>> in SGE.
> >>>>
> >>>>> ------------------------------------------------------------------
> >>>>> --
> >>>>> --
> >>>>> ----
> >>>>> A daemon (pid 11207) died unexpectedly with status 1 while
> >>>>> attempting
> >>>>> to launch so we are aborting.
> >>>>>
> >>>>> There may be more information reported by the environment (see
> >>>>> above).
> >>>>>
> >>>>> This may be because the daemon was unable to find all the needed
> >>>>> shared
> >>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
> >>>>> have the
> >>>>> location of the shared libraries on the remote nodes and this will
> >>>>> automatically be forwarded to the remote nodes.
> >>>>>
> >>>>> .............
> >>>>>
> >>>>> I know that LD_LIBRARY_PATH is not the problem,  since I checked
> >>>>> that all
> >>>>> the environment is present.... any idea?
> >>>>>
> >>>>> For previous releases of the sge and openmpi I was able to do them
> >>>>> work
> >>>>> together with a few wrappers,
> >>>>
> >>>> Which version of SGE are you using?
> >>>>
> >>>> -- Reuti
> >>>>
> >>>>> but now the integration looks much better!
> >>>>> This happen only when sending openmpi jobs.
> >>>>>
> >>>>> Thanks and all the best
> >>>>>
> >>>>> ---
> >>>>>
> >>>>>            Jaime D. Perea Duarte. <jaime at iaa dot es>
> >>>>>              Linux registered user #10472
> >>>>>
> >>>>>            Dep. Astrofisica Extragalactica.
> >>>>>            Instituto de Astrofisica de Andalucia (CSIC)
> >>>>>            Apdo. 3004, 18080 Granada, Spain.
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users