Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3
From: Craig Tierney (craig.tierney_at_[hidden])
Date: 2009-07-23 10:34:52

I have built OpenMPI 1.3.3 without support for SGE.
I just want to launch jobs with loose integration right

Here is how I configured it:

./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90
--prefix=/opt/openmpi/1.3.3-pgi --without-sge
  --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1

I can start jobs from the commandline just fine. When
I try to do the same thing inside an SGE job, I get
errors like the following:

error: executing task of job 5041155 failed:
A daemon (pid 13324) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
mpirun: clean termination accomplished

I am starting mpirun with the following options:

$OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
     -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl

The options are to ensure I am using IB, that SGE is not used, and that
the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done

This worked with 1.2.7 (except setting the pls option as gridengine
instead of sge), but I can't get it to work with 1.3.3.

Am I missing something obvious for getting jobs with loose integration