Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] rankfiles in openmpi-1.7.4
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2014-02-09 15:08:02


Hi Ralph,

thank you very much for your reply. I have changed my rankfile.

rank 0=rs0 slot=0:0-1
rank 1=rs0 slot=1
rank 2=rs1 slot=0
rank 3=rs1 slot=1

Now I get the following output.

rs0 openmpi_1.7.x_or_newer 108 mpiexec --report-bindings \
  --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname
--------------------------------------------------------------------------
Open MPI tried to bind a new process, but something went wrong. The
process was killed without launching the target application. Your job
will now abort.

  Local host: rs0
  Application name: /usr/local/bin/hostname
  Error message: hwloc indicates cpu binding cannot be enforced
  Location:
../../../../../openmpi-1.7.4/orte/mca/odls/default/odls_default_module.c:499
--------------------------------------------------------------------------
rs0 openmpi_1.7.x_or_newer 109

Kind regards

Siegmar

> > today I tested rankfiles once more. The good news first: openmpi-1.7.4
> > now supports my Sun M4000 server with Sparc VII processors on the
> > command line.
> >
> > rs0 openmpi_1.7.x_or_newer 104 mpiexec --report-bindings -np 4 \
> > --bind-to hwthread hostname
> > [rs0.informatik.hs-fulda.de:06051] MCW rank 1 bound to
> > socket 0[core 1[hwt 0]]: [../B./../..][../../../..]
> > [rs0.informatik.hs-fulda.de:06051] MCW rank 2 bound to
> > socket 1[core 4[hwt 0]]: [../../../..][B./../../..]
> > [rs0.informatik.hs-fulda.de:06051] MCW rank 3 bound to
> > socket 1[core 5[hwt 0]]: [../../../..][../B./../..]
> > [rs0.informatik.hs-fulda.de:06051] MCW rank 0 bound to
> > socket 0[core 0[hwt 0]]: [B./../../..][../../../..]
> > rs0.informatik.hs-fulda.de
> > rs0.informatik.hs-fulda.de
> > rs0.informatik.hs-fulda.de
> > rs0.informatik.hs-fulda.de
> > rs0 openmpi_1.7.x_or_newer 105
> >
> > Thank you very much for solving this problem. Unfortunately I still
> > have a problem with a rankfile. Contents of my rankfile:
> >
> > rank 0=rs0 slot=0:0-7
> > rank 1=rs0 slot=1
> > rank 2=rs1 slot=0
> > rank 3=rs1 slot=1
> >
>
>
> Here's your problem - you told us socket 0, cores 0-7. However, if
> you look at your topology, you only have *4* cores in socket 0
>
>
> >
> > rs0 openmpi_1.7.x_or_newer 105 mpiexec --report-bindings \
> > --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname
> > [rs0.informatik.hs-fulda.de:06060] [[7659,0],0] ORTE_ERROR_LOG: Not
> > found in file
> > .../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c
> > at line 283
> > [rs0.informatik.hs-fulda.de:06060] [[7659,0],0] ORTE_ERROR_LOG: Not
> > found in file
> > .../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c
> > at line 284
> > rs0 openmpi_1.7.x_or_newer 106
> >
> >
> > rs0 openmpi_1.7.x_or_newer 110 mpiexec --report-bindings \
> > --display-allocation --mca rmaps_base_verbose_100 \
> > --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname
> >
> > ====================== ALLOCATED NODES ======================
> > rs0: slots=2 max_slots=0 slots_inuse=0
> > rs1: slots=2 max_slots=0 slots_inuse=0
> > =================================================================
> > [rs0.informatik.hs-fulda.de:06074] [[7677,0],0] ORTE_ERROR_LOG: Not found in
file
> > ../../../../../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at
line 283
> > [rs0.informatik.hs-fulda.de:06074] [[7677,0],0] ORTE_ERROR_LOG: Not found in
file
> > ../../../../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line
284
> > rs0 openmpi_1.7.x_or_newer 111
> >
> >
> > rs0 openmpi_1.7.x_or_newer 111 mpiexec --report-bindings
--display-allocation --mca ess_base_verbose 5 --use-hwthread-cpus -np
> > 4 -rf rf_rs0_rs1 hostname
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying
component [env]
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Skipping
component [env]. Query failed to return a module
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying
component [hnp]
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Query of
component [hnp] set priority to 100
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying
component [singleton]
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Skipping
component [singleton]. Query failed to return a module
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying
component [tool]
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Skipping
component [tool]. Query failed to return a module
> > [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Selected
component [hnp]
> > [rs0.informatik.hs-fulda.de:06078] [[INVALID],INVALID] Topology Info:
> > [rs0.informatik.hs-fulda.de:06078] Type: Machine Number of child objects: 1
> > Name=NULL
> > total=33554432KB
> > Backend=Solaris
> > OSName=SunOS
> > OSRelease=5.10
> > OSVersion=Generic_150400-04
> > Architecture=sun4u
> > Cpuset: 0x0000ffff
> > Online: 0x0000ffff
> > Allowed: 0x0000ffff
> > Bind CPU proc: TRUE
> > Bind CPU thread: TRUE
> > Bind MEM proc: TRUE
> > Bind MEM thread: TRUE
> > Type: NUMANode Number of child objects: 2
> > Name=NULL
> > local=33554432KB
> > total=33554432KB
> > Cpuset: 0x0000ffff
> > Online: 0x0000ffff
> > Allowed: 0x0000ffff
> > Type: Socket Number of child objects: 4
> > Name=NULL
> > CPUType=sparcv9
> > CPUModel=SPARC64_VII
> > Cpuset: 0x000000ff
> > Online: 0x000000ff
> > Allowed: 0x000000ff
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000003
> > Online: 0x00000003
> > Allowed: 0x00000003
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x0000000c
> > Online: 0x0000000c
> > Allowed: 0x0000000c
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000030
> > Online: 0x00000030
> > Allowed: 0x00000030
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000010
> > Online: 0x00000010
> > Allowed: 0x00000010
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000020
> > Online: 0x00000020
> > Allowed: 0x00000020
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x000000c0
> > Online: 0x000000c0
> > Allowed: 0x000000c0
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000040
> > Online: 0x00000040
> > Allowed: 0x00000040
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000080
> > Online: 0x00000080
> > Allowed: 0x00000080
> > Type: Socket Number of child objects: 4
> > Name=NULL
> > CPUType=sparcv9
> > CPUModel=SPARC64_VII
> > Cpuset: 0x0000ff00
> > Online: 0x0000ff00
> > Allowed: 0x0000ff00
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000300
> > Online: 0x00000300
> > Allowed: 0x00000300
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000100
> > Online: 0x00000100
> > Allowed: 0x00000100
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000200
> > Online: 0x00000200
> > Allowed: 0x00000200
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000c00
> > Online: 0x00000c00
> > Allowed: 0x00000c00
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000400
> > Online: 0x00000400
> > Allowed: 0x00000400
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000800
> > Online: 0x00000800
> > Allowed: 0x00000800
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00003000
> > Online: 0x00003000
> > Allowed: 0x00003000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00001000
> > Online: 0x00001000
> > Allowed: 0x00001000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00002000
> > Online: 0x00002000
> > Allowed: 0x00002000
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x0000c000
> > Online: 0x0000c000
> > Allowed: 0x0000c000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00004000
> > Online: 0x00004000
> > Allowed: 0x00004000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00008000
> > Online: 0x00008000
> > Allowed: 0x00008000
> > [rs1.informatik.hs-fulda.de:09657] mca:base:select:( ess) Querying
component [env]
> > [rs1.informatik.hs-fulda.de:09657] mca:base:select:( ess) Query of
component [env] set priority to 20
> > [rs1.informatik.hs-fulda.de:09657] mca:base:select:( ess) Selected
component [env]
> > [rs1.informatik.hs-fulda.de:09657] ess:env set name to [[7673,0],1]
> > [rs1.informatik.hs-fulda.de:09657] [[7673,0],1] Topology Info:
> > [rs1.informatik.hs-fulda.de:09657] Type: Machine Number of child objects: 1
> > Name=NULL
> > total=33554432KB
> > Backend=Solaris
> > OSName=SunOS
> > OSRelease=5.10
> > OSVersion=Generic_150400-04
> > Architecture=sun4u
> > Cpuset: 0x0000ffff
> > Online: 0x0000ffff
> > Allowed: 0x0000ffff
> > Bind CPU proc: TRUE
> > Bind CPU thread: TRUE
> > Bind MEM proc: TRUE
> > Bind MEM thread: TRUE
> > Type: NUMANode Number of child objects: 2
> > Name=NULL
> > local=33554432KB
> > total=33554432KB
> > Cpuset: 0x0000ffff
> > Online: 0x0000ffff
> > Allowed: 0x0000ffff
> > Type: Socket Number of child objects: 4
> > Name=NULL
> > CPUType=sparcv9
> > CPUModel=SPARC64_VII
> > Cpuset: 0x000000ff
> > Online: 0x000000ff
> > Allowed: 0x000000ff
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000003
> > Online: 0x00000003
> > Allowed: 0x00000003
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x0000000c
> > Online: 0x0000000c
> > Allowed: 0x0000000c
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000030
> > Online: 0x00000030
> > Allowed: 0x00000030
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000010
> > Online: 0x00000010
> > Allowed: 0x00000010
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000020
> > Online: 0x00000020
> > Allowed: 0x00000020
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x000000c0
> > Online: 0x000000c0
> > Allowed: 0x000000c0
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000040
> > Online: 0x00000040
> > Allowed: 0x00000040
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000080
> > Online: 0x00000080
> > Allowed: 0x00000080
> > Type: Socket Number of child objects: 4
> > Name=NULL
> > CPUType=sparcv9
> > CPUModel=SPARC64_VII
> > Cpuset: 0x0000ff00
> > Online: 0x0000ff00
> > Allowed: 0x0000ff00
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000300
> > Online: 0x00000300
> > Allowed: 0x00000300
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000100
> > Online: 0x00000100
> > Allowed: 0x00000100
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000200
> > Online: 0x00000200
> > Allowed: 0x00000200
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000c00
> > Online: 0x00000c00
> > Allowed: 0x00000c00
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000400
> > Online: 0x00000400
> > Allowed: 0x00000400
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000800
> > Online: 0x00000800
> > Allowed: 0x00000800
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00003000
> > Online: 0x00003000
> > Allowed: 0x00003000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00001000
> > Online: 0x00001000
> > Allowed: 0x00001000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00002000
> > Online: 0x00002000
> > Allowed: 0x00002000
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x0000c000
> > Online: 0x0000c000
> > Allowed: 0x0000c000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00004000
> > Online: 0x00004000
> > Allowed: 0x00004000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00008000
> > Online: 0x00008000
> > Allowed: 0x00008000
> >
> > ====================== ALLOCATED NODES ======================
> > rs0: slots=2 max_slots=0 slots_inuse=0
> > rs1: slots=2 max_slots=0 slots_inuse=0
> > =================================================================
> > [rs0.informatik.hs-fulda.de:06078] [[7673,0],0] ORTE_ERROR_LOG: Not found in
file
> > ../../../../../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at
line 283
> > [rs0.informatik.hs-fulda.de:06078] [[7673,0],0] ORTE_ERROR_LOG: Not found in
file
> > ../../../../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line
284
> > [rs1.informatik.hs-fulda.de:09657] [[7673,0],1] setting up session dir with
> > tmpdir: UNDEF
> > host rs1
> > rs0 openmpi_1.7.x_or_newer 112
> >
> >
> >
> >
> > rs0 openmpi_1.7.x_or_newer 113 mpiexec --report-bindings
--display-allocation --mca plm_base_verbose 100 --use-hwthread-cpus
> > -np 4 -rf rf_rs0_rs1 hostname
> > [rs0.informatik.hs-fulda.de:06088] mca: base: components_register:
registering plm components
> > [rs0.informatik.hs-fulda.de:06088] mca: base: components_register: found
loaded component rsh
> > [rs0.informatik.hs-fulda.de:06088] mca: base: components_register: component
rsh register function successful
> > [rs0.informatik.hs-fulda.de:06088] mca: base: components_open: opening plm
components
> > [rs0.informatik.hs-fulda.de:06088] mca: base: components_open: found loaded
component rsh
> > [rs0.informatik.hs-fulda.de:06088] mca: base: components_open: component rsh
open function successful
> > [rs0.informatik.hs-fulda.de:06088] mca:base:select: Auto-selecting plm
components
> > [rs0.informatik.hs-fulda.de:06088] mca:base:select:( plm) Querying
component [rsh]
> > [rs0.informatik.hs-fulda.de:06088] [[INVALID],INVALID] plm:rsh_lookup on
agent ssh : rsh path NULL
> > [rs0.informatik.hs-fulda.de:06088] mca:base:select:( plm) Query of
component [rsh] set priority to 10
> > [rs0.informatik.hs-fulda.de:06088] mca:base:select:( plm) Selected
component [rsh]
> > [rs0.informatik.hs-fulda.de:06088] plm:base:set_hnp_name: initial bias 6088
nodename hash 3909477186
> > [rs0.informatik.hs-fulda.de:06088] plm:base:set_hnp_name: final jobfam 7567
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh_setup on agent ssh :
rsh path NULL
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:receive start comm
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_job
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm creating
map
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] setup:vm: working unmanaged
allocation
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] using rankfile rf_rs0_rs1
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] checking node rs0
> >
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] ignoring myself
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] checking node rs1
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm add new
daemon [[7567,0],1]
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm assigning
new daemon [[7567,0],1] to node rs1
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: launching vm
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: local shell: 2
(tcsh)
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: assuming same
remote shell as local shell
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: remote shell: 2
(tcsh)
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: final template
argv:
> > /usr/local/bin/ssh <template> orted -mca orte_report_bindings 1 -mca
ess env -mca orte_ess_jobid 495910912 -mca
> > orte_ess_vpid <template> -mca orte_ess_num_procs 2 -mca orte_hnp_uri
> > "495910912.0;tcp://193.174.26.198,192.168.128.1,10.1.1.2:43810" --tree-spawn
--mca plm_base_verbose 100 -mca plm rsh -mca
> > orte_rankfile rf_rs0_rs1 -mca hwloc_base_use_hwthreads_as_cpus 1 -mca
orte_display_alloc 1 -mca hwloc_base_report_bindings 1
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh:launch daemon 0 not
a child of mine
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: adding node rs1 to
launch list
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: activating launch
event
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: recording launch of
daemon [[7567,0],1]
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: executing:
(/usr/local/bin/ssh) [/usr/local/bin/ssh rs1 orted -mca
> > orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 495910912 -mca
orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca
> > orte_hnp_uri "495910912.0;tcp://193.174.26.198,192.168.128.1,10.1.1.2:43810"
--tree-spawn --mca plm_base_verbose 100 -mca plm
> > rsh -mca orte_rankfile rf_rs0_rs1 -mca hwloc_base_use_hwthreads_as_cpus 1
-mca orte_display_alloc 1 -mca
> > hwloc_base_report_bindings 1]
> > Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> > Warning: No xauth data; using fake authentication data for X11 forwarding.
> > [rs1.informatik.hs-fulda.de:09721] mca: base: components_register:
registering plm components
> > [rs1.informatik.hs-fulda.de:09721] mca: base: components_register: found
loaded component rsh
> > [rs1.informatik.hs-fulda.de:09721] mca: base: components_register: component
rsh register function successful
> > [rs1.informatik.hs-fulda.de:09721] mca: base: components_open: opening plm
components
> > [rs1.informatik.hs-fulda.de:09721] mca: base: components_open: found loaded
component rsh
> > [rs1.informatik.hs-fulda.de:09721] mca: base: components_open: component rsh
open function successful
> > [rs1.informatik.hs-fulda.de:09721] mca:base:select: Auto-selecting plm
components
> > [rs1.informatik.hs-fulda.de:09721] mca:base:select:( plm) Querying
component [rsh]
> > [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh_lookup on agent ssh
: rsh path NULL
> > [rs1.informatik.hs-fulda.de:09721] mca:base:select:( plm) Query of
component [rsh] set priority to 10
> > [rs1.informatik.hs-fulda.de:09721] mca:base:select:( plm) Selected
component [rsh]
> > [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh_setup on agent ssh :
rsh path NULL
> > [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:base:receive start comm
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_report_launch
from daemon [[7567,0],1]
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_report_launch
from daemon [[7567,0],1] on node rs1
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] RECEIVED TOPOLOGY FROM NODE
rs1
> > [rs0.informatik.hs-fulda.de:06088] Type: Machine Number of child objects: 1
> > Name=NULL
> > total=33554432KB
> > Backend=Solaris
> > OSName=SunOS
> > OSRelease=5.10
> > OSVersion=Generic_150400-04
> > Architecture=sun4u
> > Cpuset: 0x0000ffff
> > Online: 0x0000ffff
> > Allowed: 0x0000ffff
> > Bind CPU proc: TRUE
> > Bind CPU thread: TRUE
> > Bind MEM proc: TRUE
> > Bind MEM thread: TRUE
> > Type: NUMANode Number of child objects: 2
> > Name=NULL
> > local=33554432KB
> > total=33554432KB
> > Cpuset: 0x0000ffff
> > Online: 0x0000ffff
> > Allowed: 0x0000ffff
> > Type: Socket Number of child objects: 4
> > Name=NULL
> > CPUType=sparcv9
> > CPUModel=SPARC64_VII
> > Cpuset: 0x000000ff
> > Online: 0x000000ff
> > Allowed: 0x000000ff
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000003
> > Online: 0x00000003
> > Allowed: 0x00000003
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x0000000c
> > Online: 0x0000000c
> > Allowed: 0x0000000c
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000030
> > Online: 0x00000030
> > Allowed: 0x00000030
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000010
> > Online: 0x00000010
> > Allowed: 0x00000010
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000020
> > Online: 0x00000020
> > Allowed: 0x00000020
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x000000c0
> > Online: 0x000000c0
> > Allowed: 0x000000c0
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000040
> > Online: 0x00000040
> > Allowed: 0x00000040
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000080
> > Online: 0x00000080
> > Allowed: 0x00000080
> > Type: Socket Number of child objects: 4
> > Name=NULL
> > CPUType=sparcv9
> > CPUModel=SPARC64_VII
> > Cpuset: 0x0000ff00
> > Online: 0x0000ff00
> > Allowed: 0x0000ff00
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000300
> > Online: 0x00000300
> > Allowed: 0x00000300
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000100
> > Online: 0x00000100
> > Allowed: 0x00000100
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000200
> > Online: 0x00000200
> > Allowed: 0x00000200
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00000c00
> > Online: 0x00000c00
> > Allowed: 0x00000c00
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000400
> > Online: 0x00000400
> > Allowed: 0x00000400
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00000800
> > Online: 0x00000800
> > Allowed: 0x00000800
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x00003000
> > Online: 0x00003000
> > Allowed: 0x00003000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00001000
> > Online: 0x00001000
> > Allowed: 0x00001000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00002000
> > Online: 0x00002000
> > Allowed: 0x00002000
> > Type: Core Number of child objects: 2
> > Name=NULL
> > Cpuset: 0x0000c000
> > Online: 0x0000c000
> > Allowed: 0x0000c000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00004000
> > Online: 0x00004000
> > Allowed: 0x00004000
> > Type: PU Number of child objects: 0
> > Name=NULL
> > Cpuset: 0x00008000
> > Online: 0x00008000
> > Allowed: 0x00008000
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] TOPOLOGY MATCHES -
DISCARDING
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_report_launch
completed for daemon [[7567,0],1] at contact
> > 495910912.1;tcp://193.174.26.199,192.168.128.2,10.1.1.2:37231
> >
> > ====================== ALLOCATED NODES ======================
> > rs0: slots=2 max_slots=0 slots_inuse=0
> > rs1: slots=2 max_slots=0 slots_inuse=0
> > =================================================================
> > [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh: remote spawn called
> > [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh: remote spawn - have
no children!
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] ORTE_ERROR_LOG: Not found in
file
> > ../../../../../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at
line 283
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] ORTE_ERROR_LOG: Not found in
file
> > ../../../../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line
284
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_cmd sending
orted_exit commands
> > [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:receive stop comm
> > [rs0.informatik.hs-fulda.de:06088] mca: base: close: component rsh closed
> > [rs0.informatik.hs-fulda.de:06088] mca: base: close: unloading component rsh
> > [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:base:receive stop comm
> > [rs1.informatik.hs-fulda.de:09721] mca: base: close: component rsh closed
> > [rs1.informatik.hs-fulda.de:09721] mca: base: close: unloading component rsh
> > rs0 openmpi_1.7.x_or_newer 114
> >
> >
> >
> >
> > I still have the problem that I get no output if I mix little and
> > big endian machines, which works for openmpi-1.6.x.
> >
> > linpc1 openmpi_1.7.x_or_newer 112 mpiexec -report-bindings -np 4 \
> > -rf rf_linpc_sunpc_tyr hostname
> > linpc1 openmpi_1.7.x_or_newer 113
> >
> >
> >
> > linpc1 openmpi_1.7.x_or_newer 188 mpiexec -report-bindings
--display-allocation --mca plm_base_verbose 100 -np 1 -rf
> > rf_linpc_sunpc_tyr hostname
> > [linpc1:20650] mca: base: components_register: registering plm components
> > [linpc1:20650] mca: base: components_register: found loaded component rsh
> > [linpc1:20650] mca: base: components_register: component rsh register
function successful
> > [linpc1:20650] mca: base: components_register: found loaded component slurm
> > [linpc1:20650] mca: base: components_register: component slurm register
function successful
> > [linpc1:20650] mca: base: components_open: opening plm components
> > [linpc1:20650] mca: base: components_open: found loaded component rsh
> > [linpc1:20650] mca: base: components_open: component rsh open function
successful
> > [linpc1:20650] mca: base: components_open: found loaded component slurm
> > [linpc1:20650] mca: base: components_open: component slurm open function
successful
> > [linpc1:20650] mca:base:select: Auto-selecting plm components
> > [linpc1:20650] mca:base:select:( plm) Querying component [rsh]
> > [linpc1:20650] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path
NULL
> > [linpc1:20650] mca:base:select:( plm) Query of component [rsh] set priority
to 10
> > [linpc1:20650] mca:base:select:( plm) Querying component [slurm]
> > [linpc1:20650] mca:base:select:( plm) Skipping component [slurm]. Query
failed to return a module
> > [linpc1:20650] mca:base:select:( plm) Selected component [rsh]
> > [linpc1:20650] mca: base: close: component slurm closed
> > [linpc1:20650] mca: base: close: unloading component slurm
> > [linpc1:20650] plm:base:set_hnp_name: initial bias 20650 nodename hash
3902177415
> > [linpc1:20650] plm:base:set_hnp_name: final jobfam 14523
> > [linpc1:20650] [[14523,0],0] plm:rsh_setup on agent ssh : rsh path NULL
> > [linpc1:20650] [[14523,0],0] plm:base:receive start comm
> > [linpc1:20650] [[14523,0],0] plm:base:setup_job
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm creating map
> > [linpc1:20650] [[14523,0],0] setup:vm: working unmanaged allocation
> > [linpc1:20650] [[14523,0],0] using rankfile rf_linpc_sunpc_tyr
> > [linpc1:20650] [[14523,0],0] checking node linpc0
> > [linpc1:20650] [[14523,0],0] checking node linpc1
> > [linpc1:20650] [[14523,0],0] ignoring myself
> > [linpc1:20650] [[14523,0],0] checking node sunpc1
> > [linpc1:20650] [[14523,0],0] checking node tyr
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm add new daemon [[14523,0],1]
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm assigning new daemon
[[14523,0],1] to node linpc0
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm add new daemon [[14523,0],2]
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm assigning new daemon
[[14523,0],2] to node sunpc1
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm add new daemon [[14523,0],3]
> > [linpc1:20650] [[14523,0],0] plm:base:setup_vm assigning new daemon
[[14523,0],3] to node tyr
> > [linpc1:20650] [[14523,0],0] plm:rsh: launching vm
> > [linpc1:20650] [[14523,0],0] plm:rsh: local shell: 2 (tcsh)
> > [linpc1:20650] [[14523,0],0] plm:rsh: assuming same remote shell as local
shell
> > [linpc1:20650] [[14523,0],0] plm:rsh: remote shell: 2 (tcsh)
> > [linpc1:20650] [[14523,0],0] plm:rsh: final template argv:
> > /usr/local/bin/ssh <template> orted -mca orte_report_bindings 1 -mca
ess env -mca orte_ess_jobid 951779328 -mca
> > orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_hnp_uri
"951779328.0;tcp://193.174.26.208:46876" --tree-spawn
> > --mca plm_base_verbose 100 -mca plm rsh -mca hwloc_base_report_bindings 1
-mca orte_display_alloc 1 -mca orte_rankfile
> > rf_linpc_sunpc_tyr
> > [linpc1:20650] [[14523,0],0] plm:rsh:launch daemon 0 not a child of mine
> > [linpc1:20650] [[14523,0],0] plm:rsh: adding node linpc0 to launch list
> > [linpc1:20650] [[14523,0],0] plm:rsh: adding node sunpc1 to launch list
> > [linpc1:20650] [[14523,0],0] plm:rsh:launch daemon 3 not a child of mine
> > [linpc1:20650] [[14523,0],0] plm:rsh: activating launch event
> > [linpc1:20650] [[14523,0],0] plm:rsh: recording launch of daemon
[[14523,0],1]
> > [linpc1:20650] [[14523,0],0] plm:rsh: recording launch of daemon
[[14523,0],2]
> > [linpc1:20650] [[14523,0],0] plm:rsh: executing: (/usr/local/bin/ssh)
[/usr/local/bin/ssh sunpc1 orted -mca
> > orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca
orte_ess_vpid 2 -mca orte_ess_num_procs 4 -mca
> > orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --tree-spawn --mca
plm_base_verbose 100 -mca plm rsh -mca
> > hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile
rf_linpc_sunpc_tyr]
> > [linpc1:20650] [[14523,0],0] plm:rsh: executing: (/usr/local/bin/ssh)
[/usr/local/bin/ssh linpc0 orted -mca
> > orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca
orte_ess_vpid 1 -mca orte_ess_num_procs 4 -mca
> > orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --tree-spawn --mca
plm_base_verbose 100 -mca plm rsh -mca
> > hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile
rf_linpc_sunpc_tyr]
> > Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> > Warning: No xauth data; using fake authentication data for X11 forwarding.
> > X11 forwarding request failed on channel 0
> > Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> > Warning: No xauth data; using fake authentication data for X11 forwarding.
> > [sunpc1:09408] mca: base: components_register: registering plm components
> > [sunpc1:09408] mca: base: components_register: found loaded component rsh
> > [sunpc1:09408] mca: base: components_register: component rsh register
function successful
> > [sunpc1:09408] mca: base: components_open: opening plm components
> > [sunpc1:09408] mca: base: components_open: found loaded component rsh
> > [sunpc1:09408] mca: base: components_open: component rsh open function
successful
> > [sunpc1:09408] mca:base:select: Auto-selecting plm components
> > [sunpc1:09408] mca:base:select:( plm) Querying component [rsh]
> > [sunpc1:09408] [[14523,0],2] plm:rsh_lookup on agent ssh : rsh path NULL
> > [sunpc1:09408] mca:base:select:( plm) Query of component [rsh] set priority
to 10
> > [sunpc1:09408] mca:base:select:( plm) Selected component [rsh]
> > [sunpc1:09408] [[14523,0],2] plm:rsh_setup on agent ssh : rsh path NULL
> > [sunpc1:09408] [[14523,0],2] plm:base:receive start comm
> > [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon
[[14523,0],2]
> > [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon
[[14523,0],2] on node sunpc1
> > [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch completed for
daemon [[14523,0],2] at contact
> > 951779328.2;tcp://193.174.26.210:33215
> > [sunpc1:09408] [[14523,0],2] plm:rsh: remote spawn called
> > [sunpc1:09408] [[14523,0],2] plm:rsh: remote spawn - have no children!
> > [linpc0:32306] mca: base: components_register: registering plm components
> > [linpc0:32306] mca: base: components_register: found loaded component rsh
> > [linpc0:32306] mca: base: components_register: component rsh register
function successful
> > [linpc0:32306] mca: base: components_open: opening plm components
> > [linpc0:32306] mca: base: components_open: found loaded component rsh
> > [linpc0:32306] mca: base: components_open: component rsh open function
successful
> > [linpc0:32306] mca:base:select: Auto-selecting plm components
> > [linpc0:32306] mca:base:select:( plm) Querying component [rsh]
> > [linpc0:32306] [[14523,0],1] plm:rsh_lookup on agent ssh : rsh path NULL
> > [linpc0:32306] mca:base:select:( plm) Query of component [rsh] set priority
to 10
> > [linpc0:32306] mca:base:select:( plm) Selected component [rsh]
> > [linpc0:32306] [[14523,0],1] plm:rsh_setup on agent ssh : rsh path NULL
> > [linpc0:32306] [[14523,0],1] plm:base:receive start comm
> > [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon
[[14523,0],1]
> > [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon
[[14523,0],1] on node linpc0
> > [linpc1:20650] [[14523,0],0] RECEIVED TOPOLOGY FROM NODE linpc0
> > [linpc1:20650] Type: Machine Number of child objects: 2
> > Name=NULL
> > total=8387048KB
> > DMIProductName="Sun Ultra 40 Workstation"
> > DMIProductVersion=11
> > DMIBoardVendor="Sun Microsystems"
> > DMIBoardName="Sun Ultra 40 Workstation"
> > DMIBoardVersion=50
> > DMIBoardAssetTag=
> > DMIChassisVendor="Sun Microsystems"
> > DMIChassisType=17
> > DMIChassisVersion=01
> > DMIChassisAssetTag=
> > DMIBIOSVendor="Phoenix Technologies Ltd."
> > DMIBIOSVersion="1.70 "
> > DMIBIOSDate=02/15/2008
> > DMISysVendor="Sun Microsystems"
> > Backend=Linux
> > OSName=Linux
> > OSRelease=3.1.10-1.16-desktop
> > OSVersion="#1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078)"
> > Architecture=x86_64
> > Cpuset: 0x0000000f
> > Online: 0x0000000f
> > Allowed: 0x0000000f
> > Bind CPU proc: TRUE
> > Bind CPU thread: TRUE
> > Bind MEM proc: FALSE
> > Bind MEM thread: TRUE
> > Type: NUMANode Number of child objects: 2
> > Name=NULL
> > local=4192744KB
> > total=4192744KB
> > Cpuset: 0x00000003
> > Online: 0x00000003
> > Allowed: 0x00000003
> > Type: Socket Number of child objects: 2
> > Name=NULL
> > CPUModel="Dual Core AMD Opteron(tm) Processor 280"
> > Cpuset: 0x00000003
> > Online: 0x00000003
> > Allowed: 0x00000003
> > Type: L2Cache Number of child objects: 1
> > Name=NULL
> > size=1024KB
> > linesize=64
> > ways=16
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: L1dCache Number of child objects: 1
> > Name=NULL
> > size=64KB
> > linesize=64
> > ways=2
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: Core Number of child objects: 1
> > Name=NULL
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: PU Number of child
objects: 0
> > Name=NULL
> > Cpuset: 0x00000001
> > Online: 0x00000001
> > Allowed: 0x00000001
> > Type: L2Cache Number of child objects: 1
> > Name=NULL
> > size=1024KB
> > linesize=64
> > ways=16
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: L1dCache Number of child objects: 1
> > Name=NULL
> > size=64KB
> > linesize=64
> > ways=2
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: Core Number of child objects: 1
> > Name=NULL
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: PU Number of child
objects: 0
> > Name=NULL
> > Cpuset: 0x00000002
> > Online: 0x00000002
> > Allowed: 0x00000002
> > Type: Bridge Host->PCI Number of child objects: 4
> > Name=NULL
> > buses=0000:[00-03]
> > Type: PCI 10de:0053 Number of child objects: 1
> > Name=nVidia Corporation CK804 IDE
> > busid=0000:00:06.0
> > class=0101(IDE)
> > PCIVendor="nVidia Corporation"
> > PCIDevice="CK804 IDE"
> > Type: Block Number of child objects: 0
> > Name=sr0
> > Type: PCI 10de:0055 Number of child objects: 1
> > Name=nVidia Corporation CK804 Serial ATA
Controller
> > busid=0000:00:07.0
> > class=0101(IDE)
> > PCIVendor="nVidia Corporation"
> > PCIDevice="CK804 Serial ATA Controller"
> > Type: Block Number of child objects: 0
> > Name=sda
> > Type: PCI 10de:0054 Number of child objects: 0
> > Name=nVidia Corporation CK804 Serial ATA
Controller
> > busid=0000:00:08.0
> > class=0101(IDE)
> > PCIVendor="nVidia Corporation"
> > PCIDevice="CK804 Serial ATA Controller"
> > Type: PCI 10de:029d Number of child objects: 2
> > Name=nVidia Corporation G71GL [Quadro FX
3500]
> > busid=0000:03:00.0
> > class=0300(VGA)
> > PCIVendor="nVidia Corporation"
> > PCIDevice="G71GL [Quadro FX 3500]"
> > Type: GPU Number of child objects: 0
> > Name=controlD64
> > Type: GPU Number of child objects: 0
> > Name=card0
> > Type: NUMANode Number of child objects: 2
> > Name=NULL
> > local=4194304KB
> > total=4194304KB
> > Cpuset: 0x0000000c
> > Online: 0x0000000c
> > Allowed: 0x0000000c
> > Type: Socket Number of child objects: 2
> > Name=NULL
> > CPUModel="Dual Core AMD Opteron(tm) Processor 280"
> > Cpuset: 0x0000000c
> > Online: 0x0000000c
> > Allowed: 0x0000000c
> > Type: L2Cache Number of child objects: 1
> > Name=NULL
> > size=1024KB
> > linesize=64
> > ways=16
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: L1dCache Number of child objects: 1
> > Name=NULL
> > size=64KB
> > linesize=64
> > ways=2
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: Core Number of child objects: 1
> > Name=NULL
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: PU Number of child
objects: 0
> > Name=NULL
> > Cpuset: 0x00000004
> > Online: 0x00000004
> > Allowed: 0x00000004
> > Type: L2Cache Number of child objects: 1
> > Name=NULL
> > size=1024KB
> > linesize=64
> > ways=16
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: L1dCache Number of child objects: 1
> > Name=NULL
> > size=64KB
> > linesize=64
> > ways=2
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: Core Number of child objects: 1
> > Name=NULL
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: PU Number of child
objects: 0
> > Name=NULL
> > Cpuset: 0x00000008
> > Online: 0x00000008
> > Allowed: 0x00000008
> > Type: Bridge Host->PCI Number of child objects: 2
> > Name=NULL
> > buses=0000:[80-82]
> > Type: PCI 10de:0054 Number of child objects: 0
> > Name=nVidia Corporation CK804 Serial ATA
Controller
> > busid=0000:80:07.0
> > class=0101(IDE)
> > PCIVendor="nVidia Corporation"
> > PCIDevice="CK804 Serial ATA Controller"
> > Type: PCI 10de:0055 Number of child objects: 0
> > Name=nVidia Corporation CK804 Serial ATA
Controller
> > busid=0000:80:08.0
> > class=0101(IDE)
> > PCIVendor="nVidia Corporation"
> > PCIDevice="CK804 Serial ATA Controller"
> > [linpc1:20650] [[14523,0],0] NEW TOPOLOGY - ADDING
> > [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch completed for
daemon [[14523,0],1] at contact
> > 951779328.1;tcp://193.174.26.214,192.168.1.1:57891
> > [linpc0:32306] [[14523,0],1] plm:rsh: remote spawn called
> > [linpc0:32306] [[14523,0],1] plm:rsh: local shell: 2 (tcsh)
> > [linpc0:32306] [[14523,0],1] plm:rsh: assuming same remote shell as local
shell
> > [linpc0:32306] [[14523,0],1] plm:rsh: remote shell: 2 (tcsh)
> > [linpc0:32306] [[14523,0],1] plm:rsh: final template argv:
> > /usr/local/bin/ssh <template> orted -mca orte_report_bindings 1 -mca
ess env -mca orte_ess_jobid 951779328 -mca
> > orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_parent_uri
"951779328.1;tcp://193.174.26.214,192.168.1.1:57891"
> > -mca orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --mca
plm_base_verbose 100 -mca hwloc_base_report_bindings 1 -mca
> > orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr -mca plm rsh
> > [linpc0:32306] [[14523,0],1] plm:rsh: activating launch event
> > [linpc0:32306] [[14523,0],1] plm:rsh: recording launch of daemon
[[14523,0],3]
> > [linpc0:32306] [[14523,0],1] plm:rsh: executing: (/usr/local/bin/ssh)
[/usr/local/bin/ssh tyr orted -mca orte_report_bindings
> > 1 -mca ess env -mca orte_ess_jobid 951779328 -mca orte_ess_vpid 3 -mca
orte_ess_num_procs 4 -mca orte_parent_uri
> > "951779328.1;tcp://193.174.26.214,192.168.1.1:57891" -mca orte_hnp_uri
"951779328.0;tcp://193.174.26.208:46876" --mca
> > plm_base_verbose 100 -mca hwloc_base_report_bindings 1 -mca
orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr -mca
> > plm rsh --tree-spawn]
> > Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> > Warning: No xauth data; using fake authentication data for X11 forwarding.
> > [tyr.informatik.hs-fulda.de:23227] mca: base: components_register:
registering plm components
> > [tyr.informatik.hs-fulda.de:23227] mca: base: components_register: found
loaded component rsh
> > [tyr.informatik.hs-fulda.de:23227] mca: base: components_register: component
rsh register function successful
> > [tyr.informatik.hs-fulda.de:23227] mca: base: components_open: opening plm
components
> > [tyr.informatik.hs-fulda.de:23227] mca: base: components_open: found loaded
component rsh
> > [tyr.informatik.hs-fulda.de:23227] mca: base: components_open: component rsh
open function successful
> > [tyr.informatik.hs-fulda.de:23227] mca:base:select: Auto-selecting plm
components
> > [tyr.informatik.hs-fulda.de:23227] mca:base:select:( plm) Querying
component [rsh]
> > [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:rsh_lookup on agent ssh
: rsh path NULL
> > [tyr.informatik.hs-fulda.de:23227] mca:base:select:( plm) Query of
component [rsh] set priority to 10
> > [tyr.informatik.hs-fulda.de:23227] mca:base:select:( plm) Selected
component [rsh]
> > [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:rsh_setup on agent ssh
: rsh path NULL
> > [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:base:receive start comm
> > [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:base:receive stop comm
> > [tyr.informatik.hs-fulda.de:23227] mca: base: close: component rsh closed
> > [tyr.informatik.hs-fulda.de:23227] mca: base: close: unloading component rsh
> > [linpc0:32306] [[14523,0],1] daemon 3 failed with status 1
> > [linpc1:20650] [[14523,0],0] plm:base:orted_cmd sending orted_exit commands
> > [linpc1:20650] [[14523,0],0] plm:base:receive stop comm
> > [linpc1:20650] mca: base: close: component rsh closed
> > [linpc1:20650] mca: base: close: unloading component rsh
> > linpc1 openmpi_1.7.x_or_newer 189 [sunpc1:09408] [[14523,0],2]
plm:base:receive stop comm
> > [sunpc1:09408] mca: base: close: component rsh closed
> > [sunpc1:09408] mca: base: close: unloading component rsh
> > [linpc0:32306] [[14523,0],1] plm:base:receive stop comm
> > [linpc0:32306] mca: base: close: component rsh closed
> > [linpc0:32306] mca: base: close: unloading component rsh
> >
> > linpc1 openmpi_1.7.x_or_newer 189
> >
> >
> >
> > linpc1 openmpi_1.7.x_or_newer 189 mpiexec -report-bindings
--display-allocation --mca rmaps_base_verbose_100 -np 1 -rf
> > rf_linpc_sunpc_tyr hostname
> >
> > ====================== ALLOCATED NODES ======================
> > linpc1: slots=1 max_slots=0 slots_inuse=0
> > =================================================================
> > --------------------------------------------------------------------------
> > mpiexec was unable to find the specified executable file, and therefore
> > did not launch the job. This error was first reported for process
> > rank 0; it may have occurred for other processes as well.
> >
> > NOTE: A common cause for this error is misspelling a mpiexec command
> > line parameter option (remember that mpiexec interprets the first
> > unrecognized command line token as the executable).
> >
> > Node: linpc1
> > Executable: 1
> > --------------------------------------------------------------------------
> > linpc1 openmpi_1.7.x_or_newer 190
> >
> >
> >
> >
> > Kind regards
> >
> > Siegmar
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>