Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun does not propagate environment from master node to slave nodes
From: yanyg_at_[hidden]
Date: 2011-06-28 11:05:51


Hello All,

I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband
interconnection.

My system environments are as:

1)uname -a output:
Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT
2010 x86_64 x86_64 x86_64 GNU/Linux

2) /home is mounted over all nodes, and mpirun is started under
/home/...

Open MPI and application codes are compiled with intel(R)
compilers V11. Infiniband stack is Mellanox OFED 1.5.2.

I have two questions about mpirun:

a) how could I get to know what is the network interconnect
protocol used by the MPI application?

I specify "--mca btl openib,self,sm,tcp" to mpirun, but I want to
make sure it really uses infiniband interconnect.

b) when I run mpirun, I get the following message:
====== Quote begin
bash: orted: command not found
bash: orted: command not found
bash: orted: command not found
--------------------------------------------------------------------------
A daemon (pid 15120) died unexpectedly with status 127 while
attempting
to launch so we are aborting.

There may be more information reported by the environment (see
above).

This may be because the daemon was unable to find all the
needed shared
libraries on the remote node. You may set your
LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the
process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes
shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        ibnode001 - daemon did not report back when launched
        ibnode002 - daemon did not report back when launched
        ibnode003 - daemon did not report back when launched

====== Quote end

It seems orted is not found on slave nodes. If I set the PATH and
LD_LIBRARY_PATH through --prefix to mpirun, or --path, or -x
options to mpirun, to make the orted and related dynamic libs
available on slave nodes, it does not work as expected from mpirun
manual page. The only working case is that I set PATH and
LD_LIBRARY_PATH in ~/.bashrc for mpirun, and this .bashrc is
invoked by slave nodes too for login shell. I do not want to set PATH
and LD_LIBRARY_PATH in ~/.bashrc, but instead to set options to
mpirun directly.

Thanks,
Yiguang