Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Program hangs when run in the remote host ...
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-09-19 10:43:09


One thing that flags my attention. In your PATH definition, you put
$PATH ahead of your OMPI 1.3.3 installation. Thus, if there are any
system supplied versions of OMPI hanging around (and there often are),
they will be executed instead of your new installation.

You might try reversing that order.

On Sep 19, 2009, at 7:33 AM, souvik bhattacherjee wrote:

> Hi Gus (and all OpenMPI users),
>
> Thanks for your interest in my problem. However, the points you had
> raised earlier in your mails, seems to me that, I had already taken
> care of them. I had enlisted them below pointwise. Your comments are
> rewritten in RED and my replies in BLACK.
>
> 1) As you have mentioned: "I would guess you only installed OpenMPI
> only on ict1, not on ict2". However, I had mentioned initially: "I
> had installed openmpi-1.3.3 separately on two of my machines ict1
> and ict2".
>
> 2) Next you said: "I am guessing this, because you used a prefix
> under /usr/local". However, I had installed them under:
> $ mkdir build
> $ cd build
> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
> # make all install
>
> 3) Next as you pointed out: " ...not a typical name of an NFS
> mounted directory. Using an NFS mounted directory is another way to
> make OpenMPI visible to all nodes ".
> Let me tell you once again, that I am not going for an NFS
> installation as the first point in this list makes it clear.
>
> 4) In your next mail: " If you can ssh passwordless from ict1 to
> ict2 *and* vice versa ". Again as I had mentioned earlier " As a
> prerequisite, I can ssh between them without a password or
> passphrase ( I did not supply the passphrase at all ). "
>
> 5) Further as you said: " If your /etc/hosts file on *both* machines
> list ict1 and ict2
> and their IP addresses ". Let me mention here that, these things are
> already very well taken care of.
>
> 6) Finally as you said: " In case you have a /home directory on each
> machine (i.e. /home is not NFS mounted) if your .bashrc files on
> *both* machines set the PATH
> and LD_LIBRARY_PATH to point to the OpenMPI directory. "
>
> Again as I had mentioned previously, Also .bash_profile and .bashrc
> had the following lines written into them:
>
> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
> ***************************************************************************************************************
>
> As an additional bit of information, (which might assist you in the
> investigation) I had used Mandriva 2009.1 on all of my systems.
>
> Hope, this will help you. Eagerly awaiting a response.
>
> Thanks,
>
> On 9/18/09, Gus Correa <gus_at_[hidden]> wrote:
> Hi Souvik
>
> Also worth checking:
>
> 1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
> 2) If your /etc/hosts file on *both* machines list ict1 and ict2
> and their IP addresses.
> 3) In case you have a /home directory on each machine (i.e. /home is
> not NFS mounted) if your .bashrc files on *both* machines set the PATH
> and LD_LIBRARY_PATH to point to the OpenMPI directory.
>
> Gus Correa
>
>
> Gus Correa wrote:
> Hi Souvik
>
> I would guess you only installed OpenMPI only on ict1, not on ict2.
> If that is the case you won't have the required OpenMPI libraries
> on ict:/usr/local, and the job won't run on ict2.
>
> I am guessing this, because you used a prefix under /usr/local,
> which tends to be a "per machine" directory,
> not a typical name of an NFS
> mounted directory.
> Using an NFS mounted directory is another way to make
> OpenMPI visible to all nodes.
> See this FAQ:
> http://www.open-mpi.org/faq/?category=building#where-to-install
>
> I hope this helps,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> souvik bhattacherjee wrote:
> Dear all,
>
> Myself quite new to Open MPI. Recently, I had installed
> openmpi-1.3.3 separately on two of my machines ict1 and ict2. These
> machines are dual-socket quad-core (Intel Xeon E5410) i.e. each
> having 8 processors and are connected by Gigabit ethernet switch. As
> a prerequisite, I can ssh between them without a password or
> passphrase ( I did not supply the passphrase at all ). Thereafter,
>
> $ cd openmpi-1.3.3
> $ mkdir build
> $ cd build
> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
>
> Then as a root user,
>
> # make all install
>
> Also .bash_profile and .bashrc had the following lines written into
> them:
>
> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> $ cd ../examples/
> $ make
> $ mpirun -np 2 --host ict1 hello_c
> hello_c: error while loading shared libraries: libmpi.so.0: cannot
> open shared object file: No suchfile or directory
> hello_c: error while loading shared libraries: libmpi.so.0: cannot
> open shared object file: No suchfile or directory
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
> Hello, world, I am 1 of 2
> Hello, world, I am 0 of 2
>
> But the program hangs when ....
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2
> hello_c
> This statement does not produce any output. Doing top on either
> machines does not show any hello_c running. However, when I press
> Ctrl+C the following output appears
>
> ^Cmpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> ict2 - daemon did not report back when launched
>
> $
>
> The same thing repeats itself when hello_c is run from ict2. Since,
> the program does not produce any error, it becomes difficult to
> locate where I might have gone wrong.
>
> Did anyone of you encounter this problem or anything similar ? Any
> help would be much appreciated.
>
> Thanks,
>
> --
>
> Souvik
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Souvik
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users