Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] openmpi+sge
From: Jaime Perea (jaime.perea_at_[hidden])
Date: 2008-10-02 09:37:16


Hello,

I am having some problems with a combination of openmpi+sge6.2

Currently I'm working with the 1.3a1r19666 openmpi release and the
myrinet gm libraries (2.1.19) but the problem was the same with the
prior 1.3 version. In short, I'm able to send jobs to a que via qrsh,
more or less this way,

qrsh -cwd -V -q para -pe orte 6 mpirun -np 6 ctiming

ctiming is a small test program and in this way it works, but if I try to
send the same task by using qsub on a script like this one

#!/bin/sh
#$ -pe orte 6
#$ -q para
#$ -cwd
#
mpirun -np $NSLOTS /model/jaime/ctiming

It fails with a message like this,
..............

error reading job context from "qlogin_starter"
--------------------------------------------------------------------------
A daemon (pid 11207) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.

.............

I know that LD_LIBRARY_PATH is not the problem, since I checked that all
the environment is present.... any idea?

For previous releases of the sge and openmpi I was able to do them work
together with a few wrappers, but now the integration looks much better!
This happen only when sending openmpi jobs.

Thanks and all the best
 

---
           Jaime D. Perea Duarte. <jaime at iaa dot es>
             Linux registered user #10472
           Dep. Astrofisica Extragalactica.
           Instituto de Astrofisica de Andalucia (CSIC)
           Apdo. 3004, 18080 Granada, Spain.