Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] orted daemon not found! --- environment not passed to slave nodes
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-03-02 12:50:04


On Mar 2, 2012, at 9:48 AM, Yiguang Yan wrote:

> (All with the same test script test.bash I post in my previous emails, so run with app file fed to mpirun command.)
>
> (1) If I put the --prefix in the app file, on each line of it, it works fine as Jeff said.
>
> (2) Since in the manual, it is said that the full path of mpirun is the same as setting "--prefix". However, with app file,
> this is not the case. Without "--prefix" on each line of the app file, the full path of mpirun does not work.

Ralph and I just had a phone conversation about this. We consider it a bug -- you shouldn't need to put --prefix in the app file. Meaning: --prefix is currently being ignored if you use an app file (and therefore you have to put --prefix in the app file). We're going to fix that.

> (3) With "--prefix $adinahome" set on each line of the app file, it is exclusively put, on each node, the
> $adinahome/bin into the PATH, and $adinahome/lib into the LD_LIBRARY_PATH(not the $adinahome/lib64 as said
> in mpirun manual(v1.4.x)).

Correct.

> The envars $PATH and $LD_LIBARARY_PATH set in test.bash script only affect the
> envars on the submitting node(gulftown in my case). No $PATH or $LD_LIBRARY_PATH is passed to slave nodes
> even if I use "-x PATH -x LD_LIBRARY_PATH", either fed to mpirun or put on each line of the app file. I am not sure
> if this is intended, since "--prefix" overwrite the effect of "-x" option, this is different from what I see from the mpirun
> man page.

Hmm. Let's do a simple test here...

-----
[9:38] svbu-mpi:~ % cat foo
#!/bin/bash

echo test_env_var: $test_env_var
[9:38] svbu-mpi:~ % ./foo
test_env_var:
[9:38] svbu-mpi:~ % mpirun --host svbu-mpi001,svbu-mpi002 ~/foo
test_env_var:
test_env_var:
[9:38] svbu-mpi:~ % setenv test_env_var THIS-IS-TEST-ENV-VAR
[9:39] svbu-mpi:~ % ./foo
test_env_var: THIS-IS-TEST-ENV-VAR
[9:39] svbu-mpi:~ % mpirun --host svbu-mpi001,svbu-mpi002 ~/foo
test_env_var:
test_env_var:
[9:39] svbu-mpi:~ % mpirun --host svbu-mpi001,svbu-mpi002 -x test_env_var ~/foo
test_env_var: THIS-IS-TEST-ENV-VAR
test_env_var: THIS-IS-TEST-ENV-VAR
[9:39] svbu-mpi:~ %
-----

So that appears to work. Let's try with PATH.

-----
[9:41] svbu-mpi:~ % cat foo
#!/bin/bash -f

echo PATH: $PATH
[9:41] svbu-mpi:~ % ./foo
PATH: /home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin:/sbin:/usr/sbin

# That's ok. Now let's try with mpirun.

[9:41] svbu-mpi:~ % mpirun --host svbu-mpi001,svbu-mpi002 ~/foo
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin

# These look ok (my remote path is a bit longer than my local path)
# Now let's add a bogus entry the local path

[9:41] svbu-mpi:~ % set path = ($path /this/is/a/fake/path)
[9:41] svbu-mpi:~ % ./foo
PATH: /home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin:/sbin:/usr/sbin:/this/is/a/fake/path

# Good; the bogus entry is there. Now try mpirun

[9:41] svbu-mpi:~ % mpirun --host svbu-mpi001,svbu-mpi002 ~/foo
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin

# Good -- it's not there. Now -x PATH

[9:41] svbu-mpi:~ % mpirun --host svbu-mpi001,svbu-mpi002 -x PATH ~/foo
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin:/sbin:/usr/sbin:/this/is/a/fake/path
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin:/sbin:/usr/sbin:/this/is/a/fake/path

# Good -- the entry is now there on the remote nodes.
# Now let's try with --prefix and -x PATH

[9:44] svbu-mpi:~ % mpirun --prefix /home/jsquyres/bogus --host svbu-mpi001,svbu-mpi002 -x PATH ~/foo
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin:/sbin:/usr/sbin:/this/is/a/fake/path
PATH: /home/jsquyres/bogus/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/home/jsquyres/bogus/bin:/users/jsquyres/local/bin:/var/opt/intel/composerxe-2011.1.107/bin:/opt/autotools/ac268-am1113-lt242/bin:/cm/shared/apps/valgrind/3.7.0/bin:/cm/shared/apps/mercurial/2.0.2/bin:/cm/shared/apps/gcc/4.4.6/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/cm/shared/apps/slurm/2.2.4/bin:/cm/shared/apps/slurm/2.2.4/sbin:/cm/shared/apps/proxy/bin:/cm/shared/apps/subversion/1.7.2/bin:/sbin:/usr/sbin:/this/is/a/fake/path
[9:45] svbu-mpi:~ %
-----

So it seems to be working for me. Can you do a few manual tests like this and see if there's some combination that's not working properly for you?

> I have another question about the btl used for communication. I noticed that rsh is using the tcp for connection, I
> understand that tcp may be used for initial connection, but how can I know that openib(infiniband) btl is used for my
> data communication? Any explicit way?

At the moment, there are implicit ways.

TCP is used for MPI bootstrapping. But then what transport is used for MPI traffic is set by the "btl" MCA parameter (byte transfer layer), as Ralph said. You can *force* the OpenFabrics BTL to be used with something like this:

    mpirun --mca btl openib,sm,self ...

The "openib" is the OpenFabrics BTL (OpenFabric used to be called OpenIB, and we're kinda stuck with the plugin name now). "sm" is shared memory, and "self" is process loopback. So with this command line, you'll *only* use these 3 BTLs for MPI communication.

Make sense?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/