Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Program hangs when run in the remote host ...
From: Gus Correa (gus_at_[hidden])
Date: 2009-09-18 14:11:25


Hi Souvik

I would guess you only installed OpenMPI only on ict1, not on ict2.
If that is the case you won't have the required OpenMPI libraries
on ict:/usr/local, and the job won't run on ict2.

I am guessing this, because you used a prefix under /usr/local,
which tends to be a "per machine" directory,
not a typical name of an NFS
mounted directory.
Using an NFS mounted directory is another way to make
OpenMPI visible to all nodes.
See this FAQ:
http://www.open-mpi.org/faq/?category=building#where-to-install

I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

souvik bhattacherjee wrote:
> Dear all,
>
> Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
> separately on two of my machines ict1 and ict2. These machines are
> dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors
> and are connected by Gigabit ethernet switch. As a prerequisite, I can
> ssh between them without a password or passphrase ( I did not supply the
> passphrase at all ). Thereafter,
>
> $ cd openmpi-1.3.3
> $ mkdir build
> $ cd build
> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
>
> Then as a root user,
>
> # make all install
>
> Also .bash_profile and .bashrc had the following lines written into them:
>
> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> $ cd ../examples/
> $ make
> $ mpirun -np 2 --host ict1 hello_c
> hello_c: error while loading shared libraries: libmpi.so.0: cannot
> open shared object file: No suchfile or directory
> hello_c: error while loading shared libraries: libmpi.so.0: cannot
> open shared object file: No suchfile or directory
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
> Hello, world, I am 1 of 2
> Hello, world, I am 0 of 2
>
> But the program hangs when ....
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c
>
> This statement does not produce any output. Doing top on either machines
> does not show any hello_c running. However, when I press Ctrl+C the
> following output appears
>
> ^Cmpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> ict2 - daemon did not report back when launched
>
> $
>
> The same thing repeats itself when hello_c is run from ict2. Since, the
> program does not produce any error, it becomes difficult to locate where
> I might have gone wrong.
>
> Did anyone of you encounter this problem or anything similar ? Any help
> would be much appreciated.
>
> Thanks,
>
> --
>
> Souvik
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users