Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] orted daemon no found! --- environment not passed to slave nodes
From: Yiguang Yan (yanyg_at_[hidden])
Date: 2012-02-29 09:43:25


Hi Jeff,

Thanks.

I tried as what you suggested. Here are the output:

>>>
yiguang_at_gulftown testdmp]$ ./test.bash
[gulftown:25052] mca: base: components_open: Looking for plm
components
[gulftown:25052] mca: base: components_open: opening plm
components
[gulftown:25052] mca: base: components_open: found loaded
component rsh
[gulftown:25052] mca: base: components_open: component rsh
has no register function
[gulftown:25052] mca: base: components_open: component rsh
open function successful
[gulftown:25052] mca: base: components_open: found loaded
component slurm
[gulftown:25052] mca: base: components_open: component slurm
has no register function
[gulftown:25052] mca: base: components_open: component slurm
open function successful
[gulftown:25052] mca: base: components_open: found loaded
component tm
[gulftown:25052] mca: base: components_open: component tm
has no register function
[gulftown:25052] mca: base: components_open: component tm
open function successful
[gulftown:25052] mca:base:select: Auto-selecting plm components
[gulftown:25052] mca:base:select:( plm) Querying component [rsh]
[gulftown:25052] mca:base:select:( plm) Query of component [rsh]
set priority to 10
[gulftown:25052] mca:base:select:( plm) Querying component
[slurm]
[gulftown:25052] mca:base:select:( plm) Skipping component
[slurm]. Query failed to return a module
[gulftown:25052] mca:base:select:( plm) Querying component [tm]
[gulftown:25052] mca:base:select:( plm) Skipping component [tm].
Query failed to return a module
[gulftown:25052] mca:base:select:( plm) Selected component [rsh]
[gulftown:25052] mca: base: close: component slurm closed
[gulftown:25052] mca: base: close: unloading component slurm
[gulftown:25052] mca: base: close: component tm closed
[gulftown:25052] mca: base: close: unloading component tm
bash: orted: command not found
bash: orted: command not found
bash: orted: command not found
<<<

The following is the content of test.bash:
>>>
yiguang_at_gulftown testdmp]$ ./test.bash
#!/bin/sh -f
#nohup
#
# >-----------------------------------------------------------------------------------
--------<
adinahome=/usr/adina/system8.8dmp
mpirunfile=$adinahome/bin/mpirun
#
# Set envars for mpirun and orted
#
export PATH=$adinahome/bin:$adinahome/tools:$PATH
export LD_LIBRARY_PATH=$adinahome/lib:$LD_LIBRARY_PATH
#
#
# run DMP problem
#
mcaprefix="--prefix $adinahome"
mcarshagent="--mca plm_rsh_agent rsh:ssh"
mcatmpdir="--mca orte_tmpdir_base /tmp"
mcaopenibmsg="--mca btl_openib_warn_default_gid_prefix 0"
mcaenvars="-x PATH -x LD_LIBRARY_PATH"
mcabtlconn="--mca btl openib,sm,self"
mcaplmbase="--mca plm_base_verbose 100"

mcaparams="$mcaprefix $mcaenvars $mcarshagent
$mcaopenibmsg $mcabtlconn $mcatmpdir $mcaplmbase"

$mpirunfile $mcaparams --app addmpw-hostname
<<<

While the content of addmpw-hostname is:
>>>
-n 1 -host gulftown hostname
-n 1 -host ibnode001 hostname
-n 1 -host ibnode002 hostname
-n 1 -host ibnode003 thostname
<<<

After this, I also tried to specify the orted through:

--mca orte_launch_agent $adinahome/bin/orted

then, orted could be found on slave nodes, but now the shared libs
in $adinahome/lib are not on the LD_LIBRARY_PATH.

Any comments?

Thanks,
Yiguang