Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun example program fail on multiple nodes - unable to launch specified application on client node
From: qing pang (qing.pang_at_[hidden])
Date: 2009-11-05 15:19:12


Dear Sir/Madam,

I'm having problem running example program. Please kindly help --- I've
been fooling with it for days, kind of getting lost.

---------------------------------------------------------------------------------
MPIRUN fails on example hello prgram
-unable to launch the specified application on client node
---------------------------------------------------------------------------------

1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an
ethernet router through ethernet cable. Both running Ubuntu 8.10.
There's no other connections. - Is this setting OK to run OpenMPI?

2) Prerequisites

SSH has been set up so that the master node can access the client node
through passwordless ssh. I do notice that it takes 10~15 seconds
between me entering '>ssh <slave ip address>'command and getting onto
the client node.
--- Could this be too slow for openmpi to run properlly?

I do not have programs like network file system, network time protocol,
resource management, scheduler, etc installed.
--- Does OpenMPI need any prerequites other than passwordless ssh?

3) OpenMPI is installed on both nodes - downloaded from open-mpi.org,
and do configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games,
which is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the
file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
>echo $PATH
>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>echo $LD_LIBRARY_PATH
>usr/local/lib:usr/lib

But, if I do
>ssh <client_ip> 'echo $LD_LIBRARY_PATH'
nothing comes back.

while
>ssh <client_ip> 'echo $PATH'
comes back with the right path.

Is that a problem?

4) Problem:
I compiled the example Hello_c using
>mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything works fine.

But when I tried to run it on 2 nodes (-np 2)
>mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:

----------------------------------------------------------------------------
gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun
--machinefile machine.linux -np 2 $(pwd)/hello_c.out
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194

while attempting to start process rank 1.
--------------------------------------------------------------------------

Sometimes I get one other error message after that:
--------------------------------------------------------------------------
[gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv:
readv failed: Connection reset by peer (104)
------------------------------------------------------------------------------

5) Infomation attached:
ifconfig_masternode - output of ifconfig on masternode
ifconfig_slavenode - output of ifconfig on slavenode
ompi_info.txt - output of ompi_info -all
config.log - OpenMPI logfile
machine.linux - the machinefile used in mpirun command

-- 
Sincerely,
Qing Pang
(601) 979 0270


---------------------------------------------------------------------------------
MPIRUN fails on example hello prgram
-unable to launch the specified application on client node
---------------------------------------------------------------------------------

1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an ethernet router through ethernet cable. Both running Ubuntu 8.10. There's no other connections. - Is this setting OK to run OpenMPI?

2) Prerequisites

SSH has been set up so that the master node can access the client node through passwordless ssh. I do notice that it takes 10~15 seconds between me entering '>ssh <slave ip address>'command and getting onto the client node. - Can this be too slow for openmpi to run properlly?

I do not have programs like network file system, network time protocol, resource management, scheduler, etc installed. - Does OpenMPI have any prerequites other than passwordless ssh?

3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, and do configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, which is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
>echo $PATH
>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>echo $LD_LIBRARY_PATH
>usr/local/lib:usr/lib

But, if I do a
>ssh <client_ip> 'echo $LD_LIBRARY_PATH'
nothing comes back.

while
>ssh <client_ip> 'echo $PATH'
comes back with the right path.

Is that a problem?

4) Problem:
I compiled the example Hello_c using
>mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything was fine.

But when I tried to run it on 2 nodes (-np 2)
>mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:

----------------------------------------------------------------------------
gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun --machinefile machine.linux -np 2 $(pwd)/hello_c.out
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194

while attempting to start process rank 1.
--------------------------------------------------------------------------

Sometimes I get other error message after that:
--------------------------------------------------------------------------
[gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
------------------------------------------------------------------------------

5) Infomation attached:
ifconfig_masternode - output of ifconfig on masternode
ifconfig_slavenode - output of ifconfig on slavenode
ompi_info.txt - output of ompi_info -all
config.log - OpenMPI logfile
machine.linux - the machinefile used in mpirun command