Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] orted daemon not found! --- environment not passed to slave nodes
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-03-01 14:51:01


What did this command line look like? Can you provide the configure line as well?

On Mar 1, 2012, at 12:46 PM, Yiguang Yan wrote:

> Hi Jeff,
>
> Here I made a developer build, and then got the following message
> with plm_base_verbose:
>
>>>>
> [gulftown:28340] mca: base: components_open: Looking for plm
> components
> [gulftown:28340] mca: base: components_open: opening plm
> components
> [gulftown:28340] mca: base: components_open: found loaded
> component rsh
> [gulftown:28340] mca: base: components_open: component rsh
> has no register function
> [gulftown:28340] mca: base: components_open: component rsh
> open function successful
> [gulftown:28340] mca: base: components_open: found loaded
> component slurm
> [gulftown:28340] mca: base: components_open: component slurm
> has no register function
> [gulftown:28340] mca: base: components_open: component slurm
> open function successful
> [gulftown:28340] mca: base: components_open: found loaded
> component tm
> [gulftown:28340] mca: base: components_open: component tm
> has no register function
> [gulftown:28340] mca: base: components_open: component tm
> open function successful
> [gulftown:28340] mca:base:select: Auto-selecting plm components
> [gulftown:28340] mca:base:select:( plm) Querying component [rsh]
> [gulftown:28340] mca:base:select:( plm) Query of component [rsh]
> set priority to 10
> [gulftown:28340] mca:base:select:( plm) Querying component
> [slurm]
> [gulftown:28340] mca:base:select:( plm) Skipping component
> [slurm]. Query failed to return a module
> [gulftown:28340] mca:base:select:( plm) Querying component [tm]
> [gulftown:28340] mca:base:select:( plm) Skipping component [tm].
> Query failed to return a module
> [gulftown:28340] mca:base:select:( plm) Selected component [rsh]
> [gulftown:28340] mca: base: close: component slurm closed
> [gulftown:28340] mca: base: close: unloading component slurm
> [gulftown:28340] mca: base: close: component tm closed
> [gulftown:28340] mca: base: close: unloading component tm
> [gulftown:28340] plm:base:set_hnp_name: initial bias 28340
> nodename hash 3546479048
> [gulftown:28340] plm:base:set_hnp_name: final jobfam 17438
> [gulftown:28340] [[17438,0],0] plm:base:receive start comm
> [gulftown:28340] [[17438,0],0] plm:rsh: setting up job [17438,1]
> [gulftown:28340] [[17438,0],0] plm:base:setup_job for job [17438,1]
> [gulftown:28340] [[17438,0],0] plm:rsh: local shell: 0 (bash)
> [gulftown:28340] [[17438,0],0] plm:rsh: assuming same remote
> shell as local shell
> [gulftown:28340] [[17438,0],0] plm:rsh: remote shell: 0 (bash)
> [gulftown:28340] [[17438,0],0] plm:rsh: final template argv:
> /usr/bin/rsh <template> orted --daemonize -mca ess env -
> mca orte_ess_jobid 1142816768 -mca orte_ess_vpid <template> -
> mca orte_ess_num_procs 4 --hnp-uri
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;t
> cp://172.23.10.1:43159;tcp://172.33.10.1:43159" --mca
> plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix
> 0 --mca btl openib,sm,self --mca orte_tmpdir_base /tmp --mca
> plm_base_verbose 100
> [gulftown:28340] [[17438,0],0] plm:rsh:launch daemon already
> exists on node gulftown
> [gulftown:28340] [[17438,0],0] plm:rsh: launching on node
> ibnode001
> [gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon
> [[17438,0],1]
> [gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh)
> [/usr/bin/rsh ibnode001 orted --daemonize -mca ess env -mca
> orte_ess_jobid 1142816768 -mca orte_ess_vpid 1 -mca
> orte_ess_num_procs 4 --hnp-uri
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;t
> cp://172.23.10.1:43159;tcp://172.33.10.1:43159" --mca
> plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix
> 0 --mca btl openib,sm,self --mca orte_tmpdir_base /tmp --mca
> plm_base_verbose 100]
> bash: orted: command not found
> [gulftown:28340] [[17438,0],0] plm:rsh: launching on node
> ibnode002
> [gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon
> [[17438,0],2]
> [gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh)
> [/usr/bin/rsh ibnode002 orted --daemonize -mca ess env -mca
> orte_ess_jobid 1142816768 -mca orte_ess_vpid 2 -mca
> orte_ess_num_procs 4 --hnp-uri
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;t
> cp://172.23.10.1:43159;tcp://172.33.10.1:43159" --mca
> plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix
> 0 --mca btl openib,sm,self --mca orte_tmpdir_base /tmp --mca
> plm_base_verbose 100]
> bash: orted: command not found
> [gulftown:28340] [[17438,0],0] plm:rsh: launching on node
> ibnode003
> [gulftown:28340] [[17438,0],0] plm:rsh: executing: (//usr/bin/rsh)
> [/usr/bin/rsh ibnode003 orted --daemonize -mca ess env -mca
> orte_ess_jobid 1142816768 -mca orte_ess_vpid 3 -mca
> orte_ess_num_procs 4 --hnp-uri
> "1142816768.0;tcp://198.177.146.70:43159;tcp://10.10.10.4:43159;t
> cp://172.23.10.1:43159;tcp://172.33.10.1:43159" --mca
> plm_rsh_agent rsh:ssh --mca btl_openib_warn_default_gid_prefix
> 0 --mca btl openib,sm,self --mca orte_tmpdir_base /tmp --mca
> plm_base_verbose 100]
> [gulftown:28340] [[17438,0],0] plm:rsh: recording launch of daemon
> [[17438,0],3]
> bash: orted: command not found
> [gulftown:28340] [[17438,0],0] plm:base:daemon_callback
> <<<
>
>
> It seems no shell environment is passed through rsh, don't know
> why though.
>
> Any thoughts?
>
> Thanks,
> Yiguang
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users