Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Program hangs when run in the remote host ...
From: Gus Correa (gus_at_[hidden])
Date: 2009-09-18 14:22:25


Hi Souvik

Also worth checking:

1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
2) If your /etc/hosts file on *both* machines list ict1 and ict2
and their IP addresses.
3) In case you have a /home directory on each machine (i.e. /home is
not NFS mounted) if your .bashrc files on *both* machines set the PATH
and LD_LIBRARY_PATH to point to the OpenMPI directory.

Gus Correa

Gus Correa wrote:
> Hi Souvik
>
> I would guess you only installed OpenMPI only on ict1, not on ict2.
> If that is the case you won't have the required OpenMPI libraries
> on ict:/usr/local, and the job won't run on ict2.
>
> I am guessing this, because you used a prefix under /usr/local,
> which tends to be a "per machine" directory,
> not a typical name of an NFS
> mounted directory.
> Using an NFS mounted directory is another way to make
> OpenMPI visible to all nodes.
> See this FAQ:
> http://www.open-mpi.org/faq/?category=building#where-to-install
>
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> souvik bhattacherjee wrote:
>> Dear all,
>>
>> Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
>> separately on two of my machines ict1 and ict2. These machines are
>> dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors
>> and are connected by Gigabit ethernet switch. As a prerequisite, I can
>> ssh between them without a password or passphrase ( I did not supply
>> the passphrase at all ). Thereafter,
>>
>> $ cd openmpi-1.3.3
>> $ mkdir build
>> $ cd build
>> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
>>
>> Then as a root user,
>>
>> # make all install
>>
>> Also .bash_profile and .bashrc had the following lines written into them:
>>
>> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
>> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>> $ cd ../examples/
>> $ make
>> $ mpirun -np 2 --host ict1 hello_c
>> hello_c: error while loading shared libraries: libmpi.so.0: cannot
>> open shared object file: No suchfile or directory
>> hello_c: error while loading shared libraries: libmpi.so.0: cannot
>> open shared object file: No suchfile or directory
>>
>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
>> Hello, world, I am 1 of 2
>> Hello, world, I am 0 of 2
>>
>> But the program hangs when ....
>>
>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2
>> hello_c
>>
>> This statement does not produce any output. Doing top on either
>> machines does not show any hello_c running. However, when I press
>> Ctrl+C the following output appears
>>
>> ^Cmpirun: killing job...
>>
>> --------------------------------------------------------------------------
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>> below. Additional manual cleanup may be required - please refer to
>> the "orte-clean" tool for assistance.
>> --------------------------------------------------------------------------
>>
>> ict2 - daemon did not report back when launched
>>
>> $
>>
>> The same thing repeats itself when hello_c is run from ict2. Since,
>> the program does not produce any error, it becomes difficult to locate
>> where I might have gone wrong.
>>
>> Did anyone of you encounter this problem or anything similar ? Any
>> help would be much appreciated.
>>
>> Thanks,
>>
>> --
>>
>> Souvik
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users