Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] orted daemon not found! --- environment not passed to slave nodes
From: Yiguang Yan (yanyg_at_[hidden])
Date: 2012-03-02 17:23:47


It turns out that the "-x" option should be put on each line of the app file if app file is used.

OK, now test results on our cluster, in case this may be useful to some Open MPI users(Open MPI 1.4.3 used on
my system):

(1) If I run mpirun command from command line as Jeff's foo test, everything works fine, the same as in Jeff's foo
test.

(2) Now let me start mpirun from shell script:

first, foo script includes:
>>>
#!/bin/sh -f

echo $HOSTNAME: PATH : $PATH
echo $HOSTNAME: LD_LIBRARY_PATH : $LD_LIBRARY_PATH
<<<

testenvars.bash script includes:
>>>
#!/bin/sh -f
#nohup
#
# >-------------------------------------------------------------------------------------------<
adinahome=/home/yiguang/testdmp881
mpirunfile=$adinahome/bin/mpirun
#
# Set envars for mpirun and orted
#
export PATH=/this/is/a/fake/path:$adinahome/bin:$adinahome/tools:$PATH
export LD_LIBRARY_PATH=/this/is/a/fake/libdir:$adinahome/lib:$LD_LIBRARY_PATH
#
#
# run DMP problem
#
mcaprefix="--prefix $adinahome"
mcaenvars="-x PATH -x LD_LIBRARY_PATH"
mcabtlconn="--mca btl openib,sm,self"
#mcaplmbase="--mca plm_base_verbose 100"

# mpirun is under $adinahome/bin

$mpirunfile --host gulftown,ibnode001 foo
<<<

Now if I run testenvars.bash from command line:
>>>
[yiguang_at_gulftown testdmp]$ ./testenvars.bash
gulftown: PATH :
/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/ho
me/yiguang/testdmp881/tools:/usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adi
na/system8.7/tools:/usr/adina/system8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
gulftown: LD_LIBRARY_PATH :
/home/yiguang/testdmp881/lib:/home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
ibnode001: PATH : /home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/bin:/usr/bin:/usr/lib64/qt-
3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin
ibnode001: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/home/yiguang/testdmp881/lib:
<<<

If, in the testenvars.bash script, I change the line
$mpirunfile --host gulftown,ibnode001 foo
-->
mpirun --prefix $adinahome --host gulftown,ibnode001 foo

then I get the same output as above, and as expected, full path of mpirun and --prefix give us the same action. The
unexpected part is that /home/yiguang/testdmp881/bin and /home/yiguang/testdmp881/lib are included twice here,
why?

Now if I change, in the above testenvars.bash script, the line

$mpirunfile --host gulftown,ibnode001 foo
-->
mpirun --prefix $adinahome $mcaenvars --host gulftown,ibnode001 foo

Then run the script:
>>>
[yiguang_at_gulftown testdmp]$ ./testenvars.bash
gulftown: PATH :
/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/tools:/
usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adina/system8.7/tools:/usr/adina/s
ystem8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
gulftown: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
ibnode001: PATH :
/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/tools:/
usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adina/system8.7/tools:/usr/adina/s
ystem8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
ibnode001: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
<<<
This time, the PATH and LD_LIBRARY_PATH are passed to slave node, and /home/yiguang/testdmp881/bin and
/home/yiguang/testdmp881/lib include only once, different from the last test.

So far so good expect the minor things.

(3) Now I changed to use app file

First scripts, foo script is as above, testenvars-app.bash scripts includes:
>>>
[yiguang_at_gulftown testdmp]$ cat testenvars-app.bash
#!/bin/sh -f
#nohup
#
# >-------------------------------------------------------------------------------------------<
adinahome=/home/yiguang/testdmp881
mpirunfile=$adinahome/bin/mpirun
#
# Set envars for mpirun and orted
#
export PATH=/this/is/a/fake/path:$adinahome/bin:$adinahome/tools:$PATH
export LD_LIBRARY_PATH=/this/is/a/fake/libdir:$adinahome/lib:$LD_LIBRARY_PATH
#
#
# run DMP problem
#
#mcaprefix="--prefix $adinahome"
mcaenvars="-x PATH -x LD_LIBRARY_PATH"
mcabtlconn="--mca btl openib,sm,self"
#mcaplmbase="--mca plm_base_verbose 100"

$mpirunfile $mcabltconn --app addmpw-foo-nox
#$mpirunfile $mcaenvars $mcabltconn --app addmpw-foo-nox
#$mpirunfile $mcabltconn --app addmpw-foo
<<<

addmpw-foo-nox app file as:
>>>
[yiguang_at_gulftown testdmp]$ cat addmpw-foo-nox
--prefix /home/yiguang/testdmp881 -n 1 -host gulftown foo
--prefix /home/yiguang/testdmp881 -n 1 -host ibnode001 foo
<<<
addmpw-foo app file as:
>>>
[yiguang_at_gulftown testdmp]$ cat addmpw-foo
--prefix /home/yiguang/testdmp881 -x PATH -x LD_LIBRARY_PATH -n 1 -host gulftown foo
--prefix /home/yiguang/testdmp881 -x PATH -x LD_LIBRARY_PATH -n 1 -host ibnode001 foo
<<<

(a) If I run testenvars-app.bash, choosing this one from the last three lines of it:

>>>$mpirunfile $mcabltconn --app addmpw-foo-nox

then output as:
>>>
[yiguang_at_gulftown testdmp]$ ./testenvars-app.bash
gulftown: PATH :
/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/ho
me/yiguang/testdmp881/tools:/usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adi
na/system8.7/tools:/usr/adina/system8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
gulftown: LD_LIBRARY_PATH :
/home/yiguang/testdmp881/lib:/home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
ibnode001: PATH : /home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/bin:/usr/bin:/usr/lib64/qt-
3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin
ibnode001: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/home/yiguang/testdmp881/lib:
<<<

(b) If I choose the second one from the last three lines of testenvars-app.bash script, that is uncomment the line:
$mpirunfile $mcaenvars $mcabltconn --app addmpw-foo-nox
and comment out other two lines, output as:
>>>
[yiguang_at_gulftown testdmp]$ ./testenvars-app.bash
gulftown: PATH :
/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/ho
me/yiguang/testdmp881/tools:/usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adi
na/system8.7/tools:/usr/adina/system8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
gulftown: LD_LIBRARY_PATH :
/home/yiguang/testdmp881/lib:/home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
ibnode001: PATH : /home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/bin:/usr/bin:/usr/lib64/qt-
3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin
ibnode001: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/home/yiguang/testdmp881/lib:
<<<

(c) now if I uncomment the last line and comment out the other two of the last three lines, as run
$mpirunfile $mcabltconn --app addmpw-foo

then output as:
>>>
[yiguang_at_gulftown testdmp]$ ./testenvars-app.bash
gulftown: PATH :
/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/tools:/
usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adina/system8.7/tools:/usr/adina/s
ystem8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
gulftown: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
ibnode001: PATH :
/home/yiguang/testdmp881/bin:/this/is/a/fake/path:/home/yiguang/testdmp881/bin:/home/yiguang/testdmp881/tools:/
usr/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/adina/system8.8/tools:/usr/adina/system8.7/tools:/usr/adina/s
ystem8.6/tools:/usr/adina/system8.5/tools:/home/yiguang/bin
ibnode001: LD_LIBRARY_PATH : /home/yiguang/testdmp881/lib:/this/is/a/fake/libdir:/home/yiguang/testdmp881/lib:
<<<

So from tests (a),(b),(c), if I am using app file, the PATH and LD_LIBRARY_PATH are only passed to slave node
when the "-x" is set on each line of the app file, similar to the "--prefix" option.

Any conclusion? If a bug fix is admitted for the "--prefix" option, I would think this is another bug for "-x" option.

Thanks,
Yiguang