Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Program hangs when run in the remote host ...
From: souvik bhattacherjee (souvik99_at_[hidden])
Date: 2009-10-06 02:52:47


Finally, it seems I'm able to run my program on a remote host.

The problem was due to some firewall settings. Modifying the firewall ACCEPT
policy as shown below, did the work.

# /etc/init.d/ip6tables stop
Resetting built-in chains to the default ACCEPT policy: [ OK ]
# /etc/init.d/iptables stop
Resetting built-in chains to the default ACCEPT policy: [ OK ]

Another related query:

Let me mention once again, I had installed openmpi-1.3.3 separately on two
of my machines ict1 and ict2. Now when I issue the following command :

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not find
an executable:

Executable: hello_c
Node: ict1

while attempting to start process rank 1.
--------------------------------------------------------------------------

So, I did a *make* on the examples directory on ict1 to generate the
executable (One can also copy the executable from ict2 to ict1 in the same
directory).

Now, it seems to run fine.

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c
Hello, world, I am 0 of 8
Hello, world, I am 2 of 8
Hello, world, I am 4 of 8
Hello, world, I am 6 of 8
Hello, world, I am 5 of 8
Hello, world, I am 3 of 8
Hello, world, I am 7 of 8
Hello, world, I am 1 of 8
$

This implies that one has to copy the executables in the remote host each
time one requires to run a program which is different from the previous one.

Is the implication correct or is there some way around.

Thanks,

On Mon, Sep 21, 2009 at 1:54 PM, souvik bhattacherjee <souvik99_at_[hidden]>wrote:

> As Ralph suggested, I *reversed the order of my PATH settings*:
>
> This is what I it shows:
>
> $ echo $PATH
>
> /usr/local/openmpi-1.3.3/bin/:/usr/bin:/bin:/usr/local/bin:/usr/X11R6/bin/:/usr/games:/usr/lib/qt4/bin:/usr/bin:/opt/kde3/bin
>
> $ echo $LD_LIBRARY_PATH
> /usr/local/openmpi-1.3.3/lib/
>
> Moreover, I checked that there were *NO* system supplied versions of OMPI,
> previously installed. ( I did install MPICH2 earlier, but I had removed the
> binaries and the related files). This is because,
>
> $ locate mpicc
>
> /home/souvik/software/openmpi-1.3.3/build/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt
>
> /home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc-wrapper-data.txt
> /home/souvik/software/openmpi-1.3.3/build/ompi/tools/wrappers/mpicc.1
>
> /home/souvik/software/openmpi-1.3.3/contrib/platform/win32/ConfigFiles/mpicc-wrapper-data.txt.cmake
>
> /home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/mpicc-vt-wrapper-data.txt
> /home/souvik/software/openmpi-1.3.3/ompi/contrib/vt/wrappers/
> mpicc-vt-wrapper-data.txt.in
>
> /home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/mpicc-wrapper-data.txt
> /home/souvik/software/openmpi-1.3.3/ompi/tools/wrappers/
> mpicc-wrapper-data.txt.in
> /usr/local/openmpi-1.3.3/bin/mpicc
> /usr/local/openmpi-1.3.3/bin/mpicc-vt
> /usr/local/openmpi-1.3.3/share/man/man1/mpicc.1
> /usr/local/openmpi-1.3.3/share/openmpi/mpicc-vt-wrapper-data.txt
> /usr/local/openmpi-1.3.3/share/openmpi/mpicc-wrapper-data.txt
>
> does not show the occurrence of mpicc in any directory related to MPICH2.
>
> The results are same with mpirun
>
> $ locate mpirun
> /home/souvik/software/openmpi-1.3.3/build/ompi/tools/ortetools/mpirun.1
> /home/souvik/software/openmpi-1.3.3/ompi/runtime/mpiruntime.h
> /usr/local/openmpi-1.3.3/bin/mpirun
> /usr/local/openmpi-1.3.3/share/man/man1/mpirun.1
>
> *These tests were done both on ict1 and ict2*.
>
> I performed another test which probably proves that the executable finds
> the required files on the remote host. The program was run from ict2.
>
> $ cd /home/souvik/software/openmpi-1.3.3/examples/
>
> $ mpirun -np 4 --host ict2,ict1 hello_c
> bash: orted: command not found
> --------------------------------------------------------------------------
> A daemon (pid 28023) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 4 --host ict2,ict1 hello_c
>
> *This command-line statement as usual does not produce any output. On
> pressing Crtl+C, the following output occurs*
>
> ^Cmpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> ict1 - daemon did not report back when launched
>
> $
>
> Also, doing *top *does not show any *mpirun* & *hello_c* process running
> in both the hosts. However, running hello_c in a single host say, ict2 does
> show *mpirun* & *hello_c* in the process list.
>
>
>
>
>
>
> On Sat, Sep 19, 2009 at 8:13 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> One thing that flags my attention. In your PATH definition, you put $PATH
>> ahead of your OMPI 1.3.3 installation. Thus, if there are any system
>> supplied versions of OMPI hanging around (and there often are), they will be
>> executed instead of your new installation.
>> You might try reversing that order.
>>
>> On Sep 19, 2009, at 7:33 AM, souvik bhattacherjee wrote:
>>
>> Hi Gus (and all OpenMPI users),
>>
>> Thanks for your interest in my problem. However, the points you had raised
>> earlier in your mails, seems to me that, I had already taken care of them. I
>> had enlisted them below pointwise. Your comments are rewritten in *RED *and
>> my replies in *BLACK.*
>>
>> 1) As you have mentioned: "*I would guess you only installed OpenMPI only
>> on ict1, not on ict2*". However, I had mentioned initially: "*I had
>> installed openmpi-1.3.3 separately on two of my machines ict1 and ict2*".
>>
>> 2) Next you said: "*I am guessing this, because you used a prefix under
>> /usr/local*". However, I had installed them under:
>> *$ mkdir build
>> $ cd build
>> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
>> # make all install*
>>
>> 3) Next as you pointed out: "* ...not a typical name of an NFS mounted
>> directory. Using an NFS mounted directory is another way to make OpenMPI
>> visible to all nodes *".
>> Let me tell you once again, that I am not going for an NFS installation as
>> the first point in this list makes it clear.
>>
>> 4) In your next mail: " *If you can ssh passwordless from ict1 to ict2
>> *and* vice versa *". Again as I had mentioned earlier " *As a
>> prerequisite, I can ssh between them without a password or passphrase ( I
>> did not supply the passphrase at all ).* "
>>
>> 5) Further as you said: " *If your /etc/hosts file on *both* machines
>> list ict1 and ict2
>> and their IP addresses *". Let me mention here that, these things are
>> already very well taken care of.
>>
>> 6) Finally as you said: " *In case you have a /home directory on each
>> machine (i.e. /home is not NFS mounted) if your .bashrc files on *both*
>> machines set the PATH
>> and LD_LIBRARY_PATH to point to the OpenMPI directory. *"
>>
>> Again as I had mentioned previously, *Also .bash_profile and .bashrc had
>> the following lines written into them:
>>
>> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
>> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/*
>> *
>> ***************************************************************************************************************
>> *
>> **
>> As an additional bit of information, (which might assist you in the
>> investigation) I had used *Mandriva 2009.1* on all of my systems.
>>
>> Hope, this will help you. Eagerly awaiting a response.
>>
>> Thanks,
>>
>> On 9/18/09, Gus Correa <gus_at_[hidden]> wrote:
>>>
>>> Hi Souvik
>>>
>>> Also worth checking:
>>>
>>> 1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
>>> 2) If your /etc/hosts file on *both* machines list ict1 and ict2
>>> and their IP addresses.
>>> 3) In case you have a /home directory on each machine (i.e. /home is
>>> not NFS mounted) if your .bashrc files on *both* machines set the PATH
>>> and LD_LIBRARY_PATH to point to the OpenMPI directory.
>>>
>>> Gus Correa
>>>
>>> Gus Correa wrote:
>>>
>>>> Hi Souvik
>>>>
>>>> I would guess you only installed OpenMPI only on ict1, not on ict2.
>>>> If that is the case you won't have the required OpenMPI libraries
>>>> on ict:/usr/local, and the job won't run on ict2.
>>>>
>>>> I am guessing this, because you used a prefix under /usr/local,
>>>> which tends to be a "per machine" directory,
>>>> not a typical name of an NFS
>>>> mounted directory.
>>>> Using an NFS mounted directory is another way to make
>>>> OpenMPI visible to all nodes.
>>>> See this FAQ:
>>>> http://www.open-mpi.org/faq/?category=building#where-to-install
>>>>
>>>> I hope this helps,
>>>> Gus Correa
>>>> ---------------------------------------------------------------------
>>>> Gustavo Correa
>>>> Lamont-Doherty Earth Observatory - Columbia University
>>>> Palisades, NY, 10964-8000 - USA
>>>> ---------------------------------------------------------------------
>>>>
>>>>
>>>> souvik bhattacherjee wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3
>>>>> separately on two of my machines ict1 and ict2. These machines are
>>>>> dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and
>>>>> are connected by Gigabit ethernet switch. As a prerequisite, I can ssh
>>>>> between them without a password or passphrase ( I did not supply the
>>>>> passphrase at all ). Thereafter,
>>>>>
>>>>> $ cd openmpi-1.3.3
>>>>> $ mkdir build
>>>>> $ cd build
>>>>> $ ../configure --prefix=/usr/local/openmpi-1.3.3/
>>>>>
>>>>> Then as a root user,
>>>>>
>>>>> # make all install
>>>>>
>>>>> Also .bash_profile and .bashrc had the following lines written into
>>>>> them:
>>>>>
>>>>> PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
>>>>> LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> $ cd ../examples/
>>>>> $ make
>>>>> $ mpirun -np 2 --host ict1 hello_c
>>>>> hello_c: error while loading shared libraries: libmpi.so.0: cannot
>>>>> open shared object file: No suchfile or directory
>>>>> hello_c: error while loading shared libraries: libmpi.so.0: cannot
>>>>> open shared object file: No suchfile or directory
>>>>>
>>>>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
>>>>> Hello, world, I am 1 of 2
>>>>> Hello, world, I am 0 of 2
>>>>>
>>>>> But the program hangs when ....
>>>>>
>>>>> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2
>>>>> hello_c
>>>>> This statement does not produce any output. Doing top on either
>>>>> machines does not show any hello_c running. However, when I press Ctrl+C the
>>>>> following output appears
>>>>>
>>>>> ^Cmpirun: killing job...
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>>> that caused that situation.
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>>>>> below. Additional manual cleanup may be required - please refer to
>>>>> the "orte-clean" tool for assistance.
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> ict2 - daemon did not report back when launched
>>>>>
>>>>> $
>>>>>
>>>>> The same thing repeats itself when hello_c is run from ict2. Since, the
>>>>> program does not produce any error, it becomes difficult to locate where I
>>>>> might have gone wrong.
>>>>>
>>>>> Did anyone of you encounter this problem or anything similar ? Any help
>>>>> would be much appreciated.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>>
>>>>> Souvik
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Souvik
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Souvik Bhattacherjee
>
>

-- 
Souvik Bhattacherjee