Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-08-16 08:50:55

On Aug 16, 2007, at 5:34 AM, jody wrote:

> Just a quick update about my ssh/LD_LIBRARY_PATH problem.
> Apparently on my System the sshd was configured not to permit
> user defined environment variables (security reasons?).
> To fix that i had to change the file
> /etc/ssh/sshd_config
> By changing the entry
> #PermitUserEnvironment no
> to
> PermitUserEnvironment yes
> and adding these lines to the file ~/.ssh/environment
> PATH=/opt/openmpi/bin:/usr/local/bin:/bin:/usr/bin
> LD_LIBRARY_PATH=/opt/openmpi/lib
> Maybe it is an overkill, but at least ssh now makes the two
> variables available,
> and simple openmpi test applications run.

That is one option; another option which does not require root-level
changes is simply to modify your shell startup files appropriately.
The FAQ describes which files to modify for each shell.

> I have done this fixes on all my 7 gentoo machines (nano_00 -
> nano_06),
> and simple openmpi test applications run with any number of processes.
> But the fedora machine (plankton) still has problems in some cases.
> In the test application i use, process #0 broadcasts a number to all
> other processes.
> This works in the following cases always calling from nano_02:
> mpirun -np 3 --host nano_00 ./MPITest
> mpirun -np 3 --host plankton ./MPITest
> mpirun -np 3 --host plankton,nano_00 ./MPITest
> But it doesn't work like this:
> mpirun -np 4 --host nano_00,plankton ./MPITest
> as soon as the MPI_Broadcast statement is rached,
> i get an errorr message:
> [nano_00][0,1,0][btl_tcp_endpoint.c:
> 572:mca_btl_tcp_endpoint_complete_connect]
> connect() failed with errno=113

You are now technically running in a heterogeneous scenario. You
will likely need to have OMPI and your MPITest executable compiled
separately for each OS (gentoo and fedora). Differences in libc
(etc.) can make a single executable not work properly across both,
and sometimes the problems can be quite subtle / difficult to
diagnose. The easier solution is not to try having a single
executable, but rather to have installations on for each OS.

Once you have it setup, you can either rely on the PATH to find the
MPITest that is appropriate for each OS (if you set that up
properly), or you can be explicit with something like the following
(assuming that you have previously created MPITest.gentoo for gentoo
and MPITest.fedora for fedora):

     mpirun -np 1 -host gentoo_host MPITest.gentoo : \
            -np 1 -host fedora_host MPITest.fedora

Note that we do not actively test such heterogeneous scenarios, but
it should/could/might work... (read: it worked at one time, but I'm
not sure if any of us have tested it in a long time)

Jeff Squyres
Cisco Systems