Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Hetero apps just hang
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2014-01-02 05:55:37


Hi,

> We shouldn't just hang - that isn't right. Can you configure
> OMPI with --enable-debug and then add "-mca plm_base_verbose 5
> -mca state_base_verbose 5" to your cmd line so we can see where
> it is hanging?

The program doesn't hang. It completes without any output and
return status "1".

tyr small_prog 55 mpiexec -np 3 -host rs0,sunpc1,linpc1 \
  -mca plm_base_verbose 5 -mca state_base_verbose 5 rank_size
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [app]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [app]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [hnp]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Query of component [hnp] set priority to 60
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [novm]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [novm]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [orted]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [orted]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [staged_hnp]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [staged_orted]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [tool]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [tool]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Selected component [hnp]
[tyr.informatik.hs-fulda.de:12297] mca:base:select:( plm) Querying component [rsh]
[tyr.informatik.hs-fulda.de:12297] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
[tyr.informatik.hs-fulda.de:12297] mca:base:select:( plm) Query of component [rsh] set priority to 10
[tyr.informatik.hs-fulda.de:12297] mca:base:select:( plm) Selected component [rsh]
[tyr.informatik.hs-fulda.de:12297] plm:base:set_hnp_name: initial bias 12297 nodename hash 339128848
[tyr.informatik.hs-fulda.de:12297] plm:base:set_hnp_name: final jobfam 38447
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh_setup on agent ssh : rsh path NULL
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:receive start comm
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [INVALID] STATE PENDING INIT AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:900
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [INVALID] STATE PENDING INIT PRI 4
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_job
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE INIT_COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:317
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE INIT_COMPLETE PRI 4
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE PENDING ALLOCATION AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:328
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE PENDING ALLOCATION PRI 4
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE ALLOCATION COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/ras/base/ras_base_allocate.c:423
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE ALLOCATION COMPLETE PRI 4
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE PENDING DAEMON LAUNCH AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:184
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE PENDING DAEMON LAUNCH PRI 4
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm creating map
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] setup:vm: working unmanaged allocation
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] using dash_host
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] checking node rs0
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] checking node sunpc1
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] checking node linpc1
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm add new daemon [[38447,0],1]
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm assigning new daemon [[38447,0],1] to node rs0
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm add new daemon [[38447,0],2]
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm assigning new daemon [[38447,0],2] to node sunpc1
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm add new daemon [[38447,0],3]
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm assigning new daemon [[38447,0],3] to node linpc1
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: launching vm
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: local shell: 2 (tcsh)
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: assuming same remote shell as local shell
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: remote shell: 2 (tcsh)
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: final template argv:
        /usr/local/bin/ssh <template> orted -mca ess env -mca orte_ess_jobid 2519662592 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2519662592.0;tcp://193.174.24.39:59753"
--tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh:launch daemon 0 not a child of mine
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: adding node rs0 to launch list
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: adding node sunpc1 to launch list
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh:launch daemon 3 not a child of mine
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: activating launch event
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: recording launch of daemon [[38447,0],1]
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh rs0 orted -mca ess env -mca orte_ess_jobid 2519662592 -mca orte_ess_vpid 1 -mca
orte_ess_num_procs 4 -mca orte_hnp_uri "2519662592.0;tcp://193.174.24.39:59753" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh]
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh sunpc1 orted -mca ess env -mca orte_ess_jobid 2519662592 -mca orte_ess_vpid 2 -mca
orte_ess_num_procs 4 -mca orte_hnp_uri "2519662592.0;tcp://193.174.24.39:59753" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh]
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: recording launch of daemon [[38447,0],2]
X11 forwarding request failed on channel 0
[sunpc1:22290] mca:base:select:(state) Querying component [app]
[sunpc1:22290] mca:base:select:(state) Skipping component [app]. Query failed to return a module
[sunpc1:22290] mca:base:select:(state) Querying component [hnp]
[sunpc1:22290] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module
[sunpc1:22290] mca:base:select:(state) Querying component [novm]
[sunpc1:22290] mca:base:select:(state) Skipping component [novm]. Query failed to return a module
[sunpc1:22290] mca:base:select:(state) Querying component [orted]
[sunpc1:22290] mca:base:select:(state) Query of component [orted] set priority to 100
[sunpc1:22290] mca:base:select:(state) Querying component [staged_hnp]
[sunpc1:22290] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module
[sunpc1:22290] mca:base:select:(state) Querying component [staged_orted]
[sunpc1:22290] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module
[sunpc1:22290] mca:base:select:(state) Querying component [tool]
[sunpc1:22290] mca:base:select:(state) Skipping component [tool]. Query failed to return a module
[sunpc1:22290] mca:base:select:(state) Selected component [orted]
[sunpc1:22290] mca:base:select:( plm) Querying component [rsh]
[sunpc1:22290] [[38447,0],2] plm:rsh_lookup on agent ssh : rsh path NULL
[sunpc1:22290] mca:base:select:( plm) Query of component [rsh] set priority to 10
[sunpc1:22290] mca:base:select:( plm) Selected component [rsh]
[sunpc1:22290] [[38447,0],2] plm:rsh_setup on agent ssh : rsh path NULL
[sunpc1:22290] [[38447,0],2] plm:base:receive start comm
[sunpc1:22290] [[38447,0],2] ACTIVATE PROC [[38447,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205
[sunpc1:22290] [[38447,0],2] ACTIVATING PROC [[38447,0],0] STATE UNABLE TO SEND MSG PRI 0
[sunpc1:22290] [[38447,0],2] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[sunpc1:22290] [[38447,0],2] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[sunpc1:22290] [[38447,0],2] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0
[sunpc1:22290] [[38447,0],2] plm:base:receive stop comm
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] daemon 2 failed with status 1
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE PROC [[38447,0],2] STATE FAILED TO START AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:304
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING PROC [[38447,0],2] STATE FAILED TO START PRI 0
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:orted_cmd sending orted_exit commands
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB NULL STATE DAEMONS TERMINATED AT ../../openmpi-1.7.4rc2r30094/orte/orted/orted_comm.c:465
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB NULL STATE DAEMONS TERMINATED PRI 0
[tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:receive stop comm
tyr small_prog 56 [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [app]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [app]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [hnp]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [novm]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [novm]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [orted]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Query of component [orted] set priority to 100
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [staged_hnp]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [staged_orted]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [tool]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [tool]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Selected component [orted]
[rs0.informatik.hs-fulda.de:03686] mca:base:select:( plm) Querying component [rsh]
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:rsh_lookup on agent ssh : rsh path NULL
[rs0.informatik.hs-fulda.de:03686] mca:base:select:( plm) Query of component [rsh] set priority to 10
[rs0.informatik.hs-fulda.de:03686] mca:base:select:( plm) Selected component [rsh]
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:rsh_setup on agent ssh : rsh path NULL
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:base:receive start comm
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATE PROC [[38447,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATING PROC [[38447,0],0] STATE UNABLE TO SEND MSG PRI 0
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0
[rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:base:receive stop comm

tyr small_prog 56 echo $status
1
tyr small_prog 57

tyr small_prog 57 mpiexec -np 3 -host rs0,sunpc1,linpc1 -mca plm_base_verbose 5 \
  -mca state_base_verbose 5 --hetero-nodes --hetero-apps rank_size
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [app]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [app]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [hnp]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Query of component [hnp] set priority to 60
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [novm]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [novm]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [orted]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [orted]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [staged_hnp]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [staged_orted]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [tool]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [tool]. Query failed to return a module
[tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Selected component [hnp]
[tyr.informatik.hs-fulda.de:12313] mca:base:select:( plm) Querying component [rsh]
[tyr.informatik.hs-fulda.de:12313] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
[tyr.informatik.hs-fulda.de:12313] mca:base:select:( plm) Query of component [rsh] set priority to 10
[tyr.informatik.hs-fulda.de:12313] mca:base:select:( plm) Selected component [rsh]
[tyr.informatik.hs-fulda.de:12313] plm:base:set_hnp_name: initial bias 12313 nodename hash 339128848
[tyr.informatik.hs-fulda.de:12313] plm:base:set_hnp_name: final jobfam 38463
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh_setup on agent ssh : rsh path NULL
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:receive start comm
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [INVALID] STATE PENDING INIT AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:900
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [INVALID] STATE PENDING INIT PRI 4
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_job
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE INIT_COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:317
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE INIT_COMPLETE PRI 4
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE PENDING ALLOCATION AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:328
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE PENDING ALLOCATION PRI 4
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE ALLOCATION COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/ras/base/ras_base_allocate.c:423
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE ALLOCATION COMPLETE PRI 4
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE PENDING DAEMON LAUNCH AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:184
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE PENDING DAEMON LAUNCH PRI 4
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm creating map
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] setup:vm: working unmanaged allocation
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] using dash_host
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] checking node rs0
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] checking node sunpc1
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] checking node linpc1
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm add new daemon [[38463,0],1]
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm assigning new daemon [[38463,0],1] to node rs0
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm add new daemon [[38463,0],2]
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm assigning new daemon [[38463,0],2] to node sunpc1
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm add new daemon [[38463,0],3]
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm assigning new daemon [[38463,0],3] to node linpc1
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: launching vm
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: local shell: 2 (tcsh)
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: assuming same remote shell as local shell
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: remote shell: 2 (tcsh)
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: final template argv:
        /usr/local/bin/ssh <template> orted -mca orte_hetero_nodes 1 -mca ess env -mca orte_ess_jobid 2520711168 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_hnp_uri
"2520711168.0;tcp://193.174.24.39:59756" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh -mca orte_hetero_apps 1
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh:launch daemon 0 not a child of mine
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: adding node rs0 to launch list
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: adding node sunpc1 to launch list
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh:launch daemon 3 not a child of mine
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: activating launch event
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: recording launch of daemon [[38463,0],1]
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh rs0 orted -mca orte_hetero_nodes 1 -mca ess env -mca orte_ess_jobid 2520711168 -mca
orte_ess_vpid 1 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2520711168.0;tcp://193.174.24.39:59756" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh -mca orte_hetero_apps 1]
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh sunpc1 orted -mca orte_hetero_nodes 1 -mca ess env -mca orte_ess_jobid 2520711168 -mca
orte_ess_vpid 2 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2520711168.0;tcp://193.174.24.39:59756" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh -mca orte_hetero_apps 1]
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: recording launch of daemon [[38463,0],2]
Warning: No xauth data; using fake authentication data for X11 forwarding.
X11 forwarding request failed on channel 0
[sunpc1:22320] mca:base:select:(state) Querying component [app]
[sunpc1:22320] mca:base:select:(state) Skipping component [app]. Query failed to return a module
[sunpc1:22320] mca:base:select:(state) Querying component [hnp]
[sunpc1:22320] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module
[sunpc1:22320] mca:base:select:(state) Querying component [novm]
[sunpc1:22320] mca:base:select:(state) Skipping component [novm]. Query failed to return a module
[sunpc1:22320] mca:base:select:(state) Querying component [orted]
[sunpc1:22320] mca:base:select:(state) Query of component [orted] set priority to 100
[sunpc1:22320] mca:base:select:(state) Querying component [staged_hnp]
[sunpc1:22320] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module
[sunpc1:22320] mca:base:select:(state) Querying component [staged_orted]
[sunpc1:22320] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module
[sunpc1:22320] mca:base:select:(state) Querying component [tool]
[sunpc1:22320] mca:base:select:(state) Skipping component [tool]. Query failed to return a module
[sunpc1:22320] mca:base:select:(state) Selected component [orted]
[sunpc1:22320] mca:base:select:( plm) Querying component [rsh]
[sunpc1:22320] [[38463,0],2] plm:rsh_lookup on agent ssh : rsh path NULL
[sunpc1:22320] mca:base:select:( plm) Query of component [rsh] set priority to 10
[sunpc1:22320] mca:base:select:( plm) Selected component [rsh]
[sunpc1:22320] [[38463,0],2] plm:rsh_setup on agent ssh : rsh path NULL
[sunpc1:22320] [[38463,0],2] plm:base:receive start comm
[sunpc1:22320] [[38463,0],2] ACTIVATE PROC [[38463,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205
[sunpc1:22320] [[38463,0],2] ACTIVATING PROC [[38463,0],0] STATE UNABLE TO SEND MSG PRI 0
[sunpc1:22320] [[38463,0],2] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[sunpc1:22320] [[38463,0],2] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[sunpc1:22320] [[38463,0],2] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0
[sunpc1:22320] [[38463,0],2] plm:base:receive stop comm
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] daemon 2 failed with status 1
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE PROC [[38463,0],2] STATE FAILED TO START AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:304
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING PROC [[38463,0],2] STATE FAILED TO START PRI 0
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:orted_cmd sending orted_exit commands
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB NULL STATE DAEMONS TERMINATED AT ../../openmpi-1.7.4rc2r30094/orte/orted/orted_comm.c:465
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB NULL STATE DAEMONS TERMINATED PRI 0
[tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:receive stop comm
tyr small_prog 58 [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [app]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [app]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [hnp]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [novm]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [novm]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [orted]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Query of component [orted] set priority to 100
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [staged_hnp]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [staged_orted]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [tool]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [tool]. Query failed to return a module
[rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Selected component [orted]
[rs0.informatik.hs-fulda.de:03718] mca:base:select:( plm) Querying component [rsh]
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:rsh_lookup on agent ssh : rsh path NULL
[rs0.informatik.hs-fulda.de:03718] mca:base:select:( plm) Query of component [rsh] set priority to 10
[rs0.informatik.hs-fulda.de:03718] mca:base:select:( plm) Selected component [rsh]
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:rsh_setup on agent ssh : rsh path NULL
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:base:receive start comm
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATE PROC [[38463,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATING PROC [[38463,0],0] STATE UNABLE TO SEND MSG PRI 0
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0
[rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:base:receive stop comm

tyr small_prog 58 echo $status 1
tyr small_prog 59

Kind regards

Siegmar

> On Jan 1, 2014, at 1:48 AM, Siegmar Gross
> <Siegmar.Gross_at_[hidden]> wrote:
>
> > In the past I could run a small program in a real heterogeneous
> > system with little (sunpc1, linpc1) and big endian (rs0, tyr)
> > machines.
> >
> > tyr small_prog 101 ompi_info | grep MPI:
> > Open MPI: 1.6.6a1r29175
> > tyr small_prog 102 mpiexec -np 3 -host rs0,sunpc1,linpc1 rank_size
> > I'm process 1 of 3 available processes running on sunpc1.
> > MPI standard 2.1 is supported.
> > I'm process 0 of 3 available processes running on rs0.informatik.hs-fulda.de.
> > MPI standard 2.1 is supported.
> > I'm process 2 of 3 available processes running on linpc1.
> > MPI standard 2.1 is supported.
> > tyr small_prog 103
> >
> >
> > Now I get no output at all.
> >
> > tyr small_prog 130 ompi_info | grep MPI:
> > Open MPI: 1.9a1r30100
> > tyr small_prog 131 mpiexec -np 3 -host rs0,sunpc1,linpc1 rank_size
> > tyr small_prog 132 mpiexec -np 3 -host rs0,sunpc1,linpc1 \
> > --hetero-nodes --hetero-apps rank_size
> > tyr small_prog 133
> >
> >
> > Perhaps this behaviour is intended, because Open MPI doesn't
> > support little and big endian machines in the same cluster or
> > virtual computer (I know only LAM-MPI which works in such an
> > environment). On the other side: Does it make sense to run
> > the command without any output, if it doesn't work (even if
> > "mpiexec" returns "1")?
>