Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3
From: Craig Tierney (Craig.Tierney_at_[hidden])
Date: 2009-07-23 12:07:30


Rolf Vandevaart wrote:
> I think what you are looking for is this:
>
> --mca plm_rsh_disable_qrsh 1
>
> This means we will disable the use of qrsh and use rsh or ssh instead.
>
> The --mca pls ^sge does not work anymore for two reasons. First, the
> "pls" framework was renamed "plm". Secondly, the gridgengine plm was
> folded into the rsh/ssh one.
>

Rolf,

Thanks for the quick reply. That solved the problem.

Craig

> A few more details at
> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>
> Rolf
>
> On 07/23/09 10:34, Craig Tierney wrote:
>> I have built OpenMPI 1.3.3 without support for SGE.
>> I just want to launch jobs with loose integration right
>> now.
>>
>> Here is how I configured it:
>>
>> ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90
>> --prefix=/opt/openmpi/1.3.3-pgi --without-sge
>> --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1
>> --with-io-romio-flags=--with-file-system=lustre
>> --enable-orterun-prefix-by-default
>>
>> I can start jobs from the commandline just fine. When
>> I try to do the same thing inside an SGE job, I get
>> errors like the following:
>>
>>
>> error: executing task of job 5041155 failed:
>> --------------------------------------------------------------------------
>>
>> A daemon (pid 13324) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>>
>> mpirun: clean termination accomplished
>>
>>
>> I am starting mpirun with the following options:
>>
>> $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
>> -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl
>>
>> The options are to ensure I am using IB, that SGE is not used, and that
>> the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done
>> correctly.
>>
>> This worked with 1.2.7 (except setting the pls option as gridengine
>> instead of sge), but I can't get it to work with 1.3.3.
>>
>> Am I missing something obvious for getting jobs with loose integration
>> started?
>>
>> Thanks,
>> Craig
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Craig Tierney (craig.tierney_at_[hidden])