Hi Gus (and all OpenMPI users),
 
Thanks for your interest in my problem. However, the points you had raised earlier in your mails, seems to me that, I had already taken care of them. I had enlisted them below pointwise. Your comments are rewritten in RED and my replies in BLACK.
 
1) As you have mentioned: "I would guess you only installed OpenMPI only on ict1, not on ict2". However, I had mentioned initially: "I had installed openmpi-1.3.3 separately on two of my machines ict1 and ict2".
 
2) Next you said: "I am guessing this, because you used a prefix under /usr/local". However, I had installed them under:
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/
# make all install
 
3) Next as you pointed out: " ...not a typical name of an NFS mounted directory. Using an NFS mounted directory is another way to make OpenMPI visible to all nodes ".
Let me tell you once again, that I am not going for an NFS installation as the first point in this list makes it clear.
 
4) In your next mail: " If you can ssh passwordless from ict1 to ict2 *and* vice versa ". Again as I had mentioned earlier " As a prerequisite, I can ssh between them without a password or passphrase ( I did not supply the passphrase at all ). "
 
5) Further as you said: " If your /etc/hosts file on *both* machines list ict1 and ict2
and their IP addresses
". Let me mention here that, these things are already very well taken care of.
 
6) Finally as you said: " In case you have a /home directory on each machine (i.e. /home is not NFS mounted) if your .bashrc files on *both* machines set the PATH
and LD_LIBRARY_PATH to point to the OpenMPI directory.
"
 
Again as I had mentioned previously,  Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/
***************************************************************************************************************
 
As an additional bit of information, (which might assist you in the investigation) I had used Mandriva 2009.1 on all of my systems.
 
Hope, this will help you. Eagerly awaiting a response.
 
Thanks,
 
On 9/18/09, Gus Correa <gus@ldeo.columbia.edu> wrote:
Hi Souvik

Also worth checking:

1) If you can ssh passwordless from ict1 to ict2 *and* vice versa.
2) If your /etc/hosts file on *both* machines list ict1 and ict2
and their IP addresses.
3) In case you have a /home directory on each machine (i.e. /home is
not NFS mounted) if your .bashrc files on *both* machines set the PATH
and LD_LIBRARY_PATH to point to the OpenMPI directory.

Gus Correa


Gus Correa wrote:
Hi Souvik

I would guess you only installed OpenMPI only on ict1, not on ict2.
If that is the case you won't have the required  OpenMPI libraries
on ict:/usr/local, and the job won't run on ict2.

I am guessing this, because you used a prefix under /usr/local,
which tends to be a "per machine" directory,
not a typical name of an NFS
mounted directory.
Using an NFS mounted directory is another way to make
OpenMPI visible to all nodes.
See this FAQ:
http://www.open-mpi.org/faq/?category=building#where-to-install

I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


souvik bhattacherjee wrote:
Dear all,

Myself quite new to Open MPI. Recently, I had installed openmpi-1.3.3 separately on two of my machines ict1 and ict2. These machines are dual-socket quad-core (Intel Xeon E5410) i.e. each having 8 processors and are connected by Gigabit ethernet switch. As a prerequisite, I can ssh between them without a password or passphrase ( I did not supply the passphrase at all ). Thereafter,

$ cd openmpi-1.3.3
$ mkdir build
$ cd build
$ ../configure --prefix=/usr/local/openmpi-1.3.3/

Then as a root user,

# make all install

Also .bash_profile and .bashrc had the following lines written into them:

PATH=$PATH:/usr/local/openmpi-1.3.3/bin/
LD_LIBRARY_PATH=/usr/local/openmpi-1.3.3/lib/

----------------------------------------------------------------------------------------------------------------------------------------------------------------------


$ cd ../examples/
$ make
$ mpirun -np 2 --host ict1 hello_c
  hello_c: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No suchfile or directory
  hello_c: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No suchfile or directory

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1 hello_c
  Hello, world, I am 1 of 2
  Hello, world, I am 0 of 2

But the program hangs when ....

$ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict1,ict2 hello_c
 This statement does not produce any output. Doing top on either machines does not show any hello_c running. However, when I press Ctrl+C the following output appears

^Cmpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
       ict2 - daemon did not report back when launched

$

The same thing repeats itself when hello_c is run from ict2. Since, the program does not produce any error, it becomes difficult to locate where I might have gone wrong.

Did anyone of you encounter this problem or anything similar ? Any help would be much appreciated.

Thanks,

--

Souvik


------------------------------------------------------------------------

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Souvik