Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] orted daemon not found! --- environment not passed to slave nodes
From: Yiguang Yan (yanyg_at_[hidden])
Date: 2012-03-01 14:59:51


Hi Ralph,

Thanks, here is what I did as suggested by Jeff:

> What did this command line look like? Can you provide the configure line as well?

As in my previous post, the script as following:

(1) debug messages:
>>>
yiguang_at_gulftown testdmp]$ ./test.bash
[gulftown:28340] mca: base: components_open: Looking for plm components
[gulftown:28340] mca: base: components_open: opening plm components
[gulftown:28340] mca: base: components_open: found loaded component rsh
[gulftown:28340] mca: base: components_open: component rsh has no register function
[gulftown:28340] mca: base: components_open: component rsh open function successful
[gulftown:28340] mca: base: components_open: found loaded component slurm
[gulftown:28340] mca: base: components_open: component slurm has no register function
[gulftown:28340] mca: base: components_open: component slurm open function successful
[gulftown:28340] mca: base: components_open: found loaded component tm
[gulftown:28340] mca: base: components_open: component tm has no register function
[gulftown:28340] mca: base: components_open: component tm open function successful
[gulftown:28340] mca:base:select: Auto-selecting plm components
[gulftown:28340] mca:base:select:( plm) Querying component [rsh]
[gulftown:28340] mca:base:select:( plm) Query of component [rsh] set priority to 10
[gulftown:28340] mca:base:select:( plm) Querying component [slurm]
[gulftown:28340] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module
[gulftown:28340] mca:base:select:( plm) Querying component [tm]
[gulftown:28340] mca:base:select:( plm) Skipping component [tm]. Query failed to return a module
[gulftown:28340] mca:base:select:( plm) Selected component [rsh]
[gulftown:28340] mca: base: close: component slurm closed
[gulftown:28340] mca: base: close: unloading component slurm
[gulftown:28340] mca: base: close: component tm closed
[gulftown:28340] mca: base: close: unloading component tm
[gulftown:28340] plm:base:set_hnp_name: initial bias 28340 nodename hash 3546479048
[gulftown:28340] plm:base:set_hnp_name: final jobfam 17438
[gulftown:28340] [[17438,0],0] plm:base:receive start comm
[gulftown:28340] [[17438,0],0] plm:rsh: setting up job [17438,1]
[gulftown:28340] [[17438,0],0] plm:base:setup_job for job [17438,1]
[gulftown:28340] [[17438,0],0] plm:rsh: local shell: 0 (bash)
[gulftown:28340] [[17438,0],0] plm:rsh: assuming same remote shell as local shell
[gulftown:28340] [[17438,0],0] plm:rsh: remote shell: 0 (bash)
[gulftown:28340] [[17438,0],0] plm:rsh: final template argv:
        /usr/bin/rsh <template> orted --daemonize -mca ess env -mca orte_ess_jobid 1142816768 -mca
orte_ess_vpid <template> -mca orte_ess_num_procs 4 --hnp-uri
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159" -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl openib,sm,self --mca
orte_tmpdir_base /tmp --mca plm_base_verbose 100
[gulftown:28340] [[17438,0],0] plm:rsh:launch daemon already exists on node gulftown
[gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode001
[gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon [[17438,0],1]
[gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) [/usr/bin/rsh ibnode001 orted --daemonize -mca
ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 --hnp-uri
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159" -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl openib,sm,self --mca
orte_tmpdir_base /tmp --mca plm_base_verbose 100]
bash: orted: command not found
[gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode002
[gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon [[17438,0],2]
[gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) [/usr/bin/rsh ibnode002 orted --daemonize -mca
ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 --hnp-uri
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159" -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl openib,sm,self --mca
orte_tmpdir_base /tmp --mca plm_base_verbose 100]
bash: orted: command not found
[gulftown:28340] [[17438,0],0] plm:rsh: launching on node ibnode003
[gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh) [/usr/bin/rsh ibnode003 orted --daemonize -mca
ess env -mca orte_ess_jobid 1142816768 -mca orte_ess_vpid 3 -mca orte_ess_num_procs 4 --hnp-uri
"1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;tcp://172.23.10.1:43159;tcp://172.33.10.1:43159" -
-mca plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix 0 --mca btl openib,sm,self --mca
orte_tmpdir_base /tmp --mca plm_base_verbose 100]
[gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon [[17438,0],3]
bash: orted: command not found
[gulftown:28340] [[17438,0],0] plm:base:daemon_callback
<<<

(2) test.bash script:
>>>
#!/bin/sh -f
#nohup
#
# >-------------------------------------------------------------------------------------------<
adinahome=/usr/adina/system8.8dmp
mpirunfile=$adinahome/bin/mpirun
#
# Set envars for mpirun and orted
#
export PATH=$adinahome/bin:$adinahome/tools:$PATH
export LD_LIBRARY_PATH=$adinahome/lib:$LD_LIBRARY_PATH
#
#
# run DMP problem
#
mcaprefix="--prefix $adinahome"
mcarshagent="--mca plm_rsh_agent rsh:ssh"
mcatmpdir="--mca orte_tmpdir_base /tmp"
mcaopenibmsg="--mca btl_openib_warn_default_gid_prefix 0"
mcaenvars="-x PATH -x LD_LIBRARY_PATH"
mcabtlconn="--mca btl openib,sm,self"
mcaplmbase="--mca plm_base_verbose 100"

mcaparams="$mcaprefix $mcaenvars $mcarshagent $mcaopenibmsg $mcabtlconn $mcatmpdir $mcaplmbase"

$mpirunfile $mcaparams --app addmpw-hostname
<<<

(3) the contend of app file addmpw-hostname:
>>>
-n 1 -host gulftown hostname
-n 1 -host ibnode001 hostname
-n 1 -host ibnode002 hostname
-n 1 -host ibnode003 thostname
<<<

Any comments?

Thanks,
Yiguang