Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-08-14 12:09:51


Jody,

jody wrote:
> Hi TIm
> thanks for the suggestions.
>
> I now set both paths in .zshenv but it seems that LD_LIBRARY_PATH
> still does not get set.
> The ldd experment shows that all openmpi libraries are not found,
> and indeed the printenv shows that PATH is there but LD_LIBRARY_PATH is
> not.
Are you setting LD_LIBRARY_PATH anywhere else in your scripts? I have,
on more than one occasion, forgotten that I needed to do:
export LD_LIBRARY_PATH="/foo:$LD_LIBRARY_PATH"

Instead of just:
export LD_LIBRARY_PATH="/foo"

>
> It is rather unclear why this happens...
>
> As to thew second problem:
> $ mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02
> ./MPI2Test2
> [aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect:
> connect to 130.60.49.134:40618 <http://130.60.49.134:40618> failed:
> (103)
> [aim-nano_02:05455] [0,0,1]-[0,0,0] mca_oob_tcp_peer_try_connect:
> connect to 130.60.49.134:40618 <http://130.60.49.134:40618> failed,
> connecting over all interfaces failed!
> [aim-nano_02:05455] OOB: Connection to HNP lost
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at
> line 275
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1164
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> ERROR: A daemon on node nano_02 failed to start as expected.
> [ aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> ERROR: There may be more information available from
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> ERROR: the remote shell (see above).
> [ aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> ERROR: The daemon exited unexpectedly with status 1.
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at
> line 188
> [aim-plankton.unizh.ch:24222 <http://aim-plankton.unizh.ch:24222>]
> [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1196
>
> The strange thing is that nano_02's address is 130.60.49.130
> <http://130.60.49.130> and plankton's (the caller) is 130.60.49 134.
> I also made sure that nano_02 cann ssh to plankton without password, but
> that didn't change the output.

What is happening here is that the daemon launched on nano_02 is trying
to contact mpirun on plankton, and is failing for some reason.

Do you have any firewalls/port filtering enabled on nano_02? Open MPI
generally cannot be run when there are any firewalls on the machines
being used.

Hope this helps,

Tim

>
> Does this message give any hints as to the problem?
>
> Jody
>
>
> On 8/14/07, *Tim Prins* <tprins_at_[hidden]
> <mailto:tprins_at_[hidden]>> wrote:
>
> Hi Jody,
>
> jody wrote:
> > Hi
> > I installed openmpi 1.2.2 on a quad core intel machine running
> fedora 6
> > (hostname plankton)
> > I set PATH and LD_LIBRARY in the .zshrc file:
> Note that .zshrc is only used for interactive logins. You need to setup
> your system so the LD_LIBRARY_PATH and PATH is also set for
> non-interactive logins. See this zsh FAQ entry for what files you need
> to modify:
> http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19
> <http://zsh.sourceforge.net/FAQ/zshfaq03.html#l19>
>
> (BTW: I do not use zsh, but my assumption is that the file you want to
> set the PATH and LD_LIBRARY_PATH in is .zshenv)
> > $ echo $PATH
> >
> /opt/openmpi/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/jody/bin
>
> >
> > $ echo $LD_LIBRARY_PATH
> > /opt/openmpi/lib:
> >
> > When i run
> > $ mpirun -np 2 ./MPITest2
> > i get the message
> > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
> > cannot open shared object file: No such file or directory
> > ./MPI2Test2: error while loading shared libraries: libmpi_cxx.so.0:
> > cannot open shared object file: No such file or directory
> >
> > However
> > $ mpirun -np 2 --prefix /opt/openmpi ./MPI2Test2
> > works. Any explanation?
> Yes, the LD_LIBRARY_PATH is probably not set correctly. Try running:
> mpirun -np 2 ldd ./MPITest2
>
> This should show what libraries your executable is using. Make sure all
> of the libraries are resolved.
>
> Also, try running:
> mpirun -np 1 printenv |grep LD_LIBRARY_PATH
> to see what the LD_LIBRARY_PATH is for you executables. Note that you
> can NOT simply run mpirun echo $LD_LIBRARY_PATH, as the variable
> will be
> interpreted in the executing shell.
>
> >
> > Second problem:
> > I have also installed openmpi 1.2.2 on an AMD machine running gentoo
> > linux (hostname nano_02).
> > Here as well PATH and LD_LIBRARY_PATH are set correctly,
> > and
> > $ mpirun -np 2 ./MPITest2
> > works locally on nano_02.
> >
> > If, however, from plankton i call
> > $ mpirun -np 2 --prefix /opt/openmpi --host nano_02 ./MPI2Test2
> > the call hangs with no output whatsoever.
> > Any pointers on how to solve this problem?
> Try running:
> mpirun --debug-daemons -np 2 --prefix /opt/openmpi --host nano_02
> ./MPI2Test2
>
> This should give some more output as to what is happening.
>
> Hope this helps,
>
> Tim
>
> >
> > Thank You
> > Jody
> >
> >
> >
> >
> ------------------------------------------------------------------------
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users