Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi+sge
From: Jaime Perea (jaime.perea_at_[hidden])
Date: 2008-10-02 10:12:25


Hi again, thanks for the answer

Actually I took the definition of the pe from the openmpi
webpage, in my case

qconf -sp orte
pe_name orte
slots 24
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE

Our sge is version 6.2 and openmpi was configured with
the --with-sge switch of course.

Regards

--
Jaime Perea
El Jueves, 2 de Octubre de 2008, Reuti escribió:
> Hi,
>
> Am 02.10.2008 um 15:37 schrieb Jaime Perea:
> > Hello,
> >
> > I am having some problems with a combination of openmpi+sge6.2
> >
> > Currently I'm working with the 1.3a1r19666 openmpi release and the
>
> AFAIK, you have to enable SGE support in Open MPI 1.3 during its
> compilation.
>
> > myrinet gm libraries (2.1.19)  but the problem was the same with the
> > prior 1.3 version. In short, I'm able to send jobs to a que via qrsh,
> > more or less this way,
> >
> > qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming
>
> It should also work without specifying the number of slots a second
> time, i.e.:
>
> qrsh -cwd -V -q para -pe orte 6 mpirun ctiming
>
> > ctiming is a small test program and in this way it works, but if I
> > try to
> > send the same task by using qsub on a script like this one
> >
> > #!/bin/sh
> > #$ -pe orte 6
>
> This PE has just /bin/true for start-/stop_proc_args?
>
> > #$ -q para
> > #$ -cwd
> > #
> > mpirun -np $NSLOTS  /model/jaime/ctiming
>
> mpirun /model/jaime/ctiming
>
> > It fails with a message like this,
> > ..............
> >
> > error reading job context from "qlogin_starter"
>
> qlogin_starter should of course only be started with a qlogin command
> in SGE.
>
> > ----------------------------------------------------------------------
> > ----
> > A daemon (pid 11207) died unexpectedly with status 1 while attempting
> > to launch so we are aborting.
> >
> > There may be more information reported by the environment (see above).
> >
> > This may be because the daemon was unable to find all the needed
> > shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to
> > have the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
> > .............
> >
> > I know that LD_LIBRARY_PATH is not the problem,  since I checked
> > that all
> > the environment is present.... any idea?
> >
> > For previous releases of the sge and openmpi I was able to do them
> > work
> > together with a few wrappers,
>
> Which version of SGE are you using?
>
> -- Reuti
>
> > but now the integration looks much better!
> > This happen only when sending openmpi jobs.
> >
> > Thanks and all the best
> >
> > ---
> >
> >            Jaime D. Perea Duarte. <jaime at iaa dot es>
> >              Linux registered user #10472
> >
> >            Dep. Astrofisica Extragalactica.
> >            Instituto de Astrofisica de Andalucia (CSIC)
> >            Apdo. 3004, 18080 Granada, Spain.
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users