Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Program hangs when run in the remote host ...
From: souvik bhattacherjee (souvik99_at_[hidden])
Date: 2009-09-18 13:17:12


Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
separately on two of my machines ict1 and ict2. These machines are
dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and
are connected by Gigabit ethernet switch. As a prerequisite, I can ssh
between them without a password or passphrase ( I did not supply the
passphrase at all ). Thereafter,

$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No suchfile or directory
   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No suchfile or directory

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
   Hello, world, I am 1 of 2
   Hello, world, I am 0 of 2

But the program hangs when ....

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c

This statement does not produce any output. Doing top on either machines
does not show any hello_c running. However, when I press Ctrl+C the
following output appears

^Cmpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        ict2 - daemon did not report back when launched

$

The same thing repeats itself when hello_c is run from ict2. Since, the
program does not produce any error, it becomes difficult to locate where I
might have gone wrong.

Did anyone of you encounter this problem or anything similar ? Any help
would be much appreciated.

Thanks,

-- 
Souvik