Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2009-07-23 10:50:46


I think what you are looking for is this:

--mca plm_rsh_disable_qrsh 1

This means we will disable the use of qrsh and use rsh or ssh instead.

The --mca pls ^sge does not work anymore for two reasons. First, the
"pls" framework was renamed "plm". Secondly, the gridgengine plm was
folded into the rsh/ssh one.

A few more details at
http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge

Rolf

On 07/23/09 10:34, Craig Tierney wrote:
> I have built OpenMPI 1.3.3 without support for SGE.
> I just want to launch jobs with loose integration right
> now.
>
> Here is how I configured it:
>
> ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90
> --prefix=/opt/openmpi/1.3.3-pgi --without-sge
> --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1
> --with-io-romio-flags=--with-file-system=lustre
> --enable-orterun-prefix-by-default
>
> I can start jobs from the commandline just fine. When
> I try to do the same thing inside an SGE job, I get
> errors like the following:
>
>
> error: executing task of job 5041155 failed:
> --------------------------------------------------------------------------
> A daemon (pid 13324) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
>
> I am starting mpirun with the following options:
>
> $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
> -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl
>
> The options are to ensure I am using IB, that SGE is not used, and that
> the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done
> correctly.
>
> This worked with 1.2.7 (except setting the pls option as gridengine
> instead of sge), but I can't get it to work with 1.3.3.
>
> Am I missing something obvious for getting jobs with loose integration
> started?
>
> Thanks,
> Craig
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
=========================
rolf.vandevaart_at_[hidden]
781-442-3043
=========================