Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Program hangs when run in the remote host ...
From: souvik bhattacherjee (souvik99_at_[hidden])
Date: 2009-09-18 13:17:12


Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
separately on two of my machines ict1 and ict2. These machines are
dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and
are connected by Gigabit ethernet switch. As a prerequisite, I can ssh
between them without a password or passphrase ( I did not supply the
passphrase at all ). Thereafter,

$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No suchfile or directory
   hello_c: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No suchfile or directory

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
   Hello, world, I am 1 of 2
   Hello, world, I am 0 of 2

But the program hangs when ....

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c

This statement does not produce any output. Doing top on either machines
does not show any hello_c running. However, when I press Ctrl+C the
following output appears

^Cmpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        ict2 - daemon did not report back when launched

$

The same thing repeats itself when hello_c is run from ict2. Since, the
program does not produce any error, it becomes difficult to locate where I
might have gone wrong.

Did anyone of you encounter this problem or anything similar ? Any help
would be much appreciated.

Thanks,

-- 
Souvik