Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-07-19 09:45:05

On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote:
> > But this will lockup:
> >
> > pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961 printenv | grep
> > LD
> >
> > The reason is that the hostname in this last command doesn't match the
> > hostname I get when I query my interfaces, so mpirun thinks it must be a
> > remote host - and so we stick in ssh until that times out. Which could be
> > quick on your machine, but takes awhile for me.
> >
> This is not my case. mpirun resolves hostname and runs env but
> LD_LIBRARY_PATH is not there. If I use full name like this
> # /home/glebn/openmpi/bin/mpirun -np 1 -H env | grep LD_LIBRARY_PATH
> LD_LIBRARY_PATH=/home/glebn/openmpi/lib
> everything is OK.
More info. If I provide hostname to mpirun as returned by command
"hostname" the LD_LIBRARY_PATH is not set:
# /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` env | grep LD

if I provide any other name that resolves to the same IP then
# /home/glebn/openmpi/bin/mpirun -np 1 -H localhost env | grep LD

Here is debug output of "bad" run:
/home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` -mca pls_rsh_debug 1 echo
[elfit1:14730] pls:rsh: launching job 1
[elfit1:14730] pls:rsh: no new daemons to launch

Here is good one:
/home/glebn/openmpi/bin/mpirun -np 1 -H localhost -mca pls_rsh_debug 1 echo
[elfit1:14752] pls:rsh: launching job 1
[elfit1:14752] pls:rsh: local csh: 0, local sh: 1
[elfit1:14752] pls:rsh: assuming same remote shell as local shell
[elfit1:14752] pls:rsh: remote csh: 0, remote sh: 1
[elfit1:14752] pls:rsh: final template argv:
[elfit1:14752] pls:rsh: /usr/bin/ssh <template> orted --name <template> --num_procs 1 --vpid_start 0 --nodename <template> --universe root_at_elfit1:default-universe-14752 --nsreplica "0.0.0;tcp://;tcp://" --gprreplica "0.0.0;tcp://;tcp://" -mca mca_base_param_file_path /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/glebn/openmpiwd -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
[elfit1:14752] pls:rsh: launching on node localhost
[elfit1:14752] pls:rsh: localhost is a LOCAL node
[elfit1:14752] pls:rsh: reset PATH: /home/glebn/openmpi/bin:/home/USERS/lenny/MPI/mpi/bin:/opt/vltmpi/OPENIB/mpi/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
[elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/openmpi/lib
[elfit1:14752] pls:rsh: changing to directory /root
[elfit1:14752] pls:rsh: executing: (/home/glebn/openmpi/bin/orted) [orted --name 0.0.1 --num_procs 1 --vpid_start 0 --nodename localhost --universe root_at_elfit1:default-universe-14752 --nsreplica "0.0.0;tcp://;tcp://" --gprreplica "0.0.0;tcp://;tcp://" -mca mca_base_param_file_path /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/glebn/openmpiwd -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd --set-sid]