Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] openmpi+sge
From: Reuti (reuti_at_[hidden])
Date: 2008-10-02 10:03:18


Hi,

Am 02.10.2008 um 15:37 schrieb Jaime Perea:

> Hello,
>
> I am having some problems with a combination of openmpi+sge6.2
>
> Currently I'm working with the 1.3a1r19666 openmpi release and the

AFAIK, you have to enable SGE support in Open MPI 1.3 during its
compilation.

> myrinet gm libraries (2.1.19) but the problem was the same with the
> prior 1.3 version. In short, I'm able to send jobs to a que via qrsh,
> more or less this way,
>
> qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming

It should also work without specifying the number of slots a second
time, i.e.:

qrsh -cwd -V -q para -pe orte 6 mpirun ctiming

> ctiming is a small test program and in this way it works, but if I
> try to
> send the same task by using qsub on a script like this one
>
> #!/bin/sh
> #$ -pe orte 6

This PE has just /bin/true for start-/stop_proc_args?

> #$ -q para
> #$ -cwd
> #
> mpirun -np $NSLOTS /model/jaime/ctiming

mpirun /model/jaime/ctiming

> It fails with a message like this,
> ..............
>
> error reading job context from "qlogin_starter"

qlogin_starter should of course only be started with a qlogin command
in SGE.

> ----------------------------------------------------------------------
> ----
> A daemon (pid 11207) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed
> shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
>
> .............
>
> I know that LD_LIBRARY_PATH is not the problem, since I checked
> that all
> the environment is present.... any idea?
>
> For previous releases of the sge and openmpi I was able to do them
> work
> together with a few wrappers,

Which version of SGE are you using?

-- Reuti

> but now the integration looks much better!
> This happen only when sending openmpi jobs.
>
> Thanks and all the best
>
> ---
>
> Jaime D. Perea Duarte. <jaime at iaa dot es>
> Linux registered user #10472
>
> Dep. Astrofisica Extragalactica.
> Instituto de Astrofisica de Andalucia (CSIC)
> Apdo. 3004, 18080 Granada, Spain.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users