Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] torque pbs behaviour...
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-11 11:15:06


Well, it now is launching just fine, so that's one thing! :-)

Afraid I'll have to let the TCP btl guys take over from here. It looks like
everything is up and running, but something strange is going on in the MPI
comm layer.

You can turn off those mca params I gave you as you are now past that point.
I know there are others that can help debug that TCP btl error, but they can
help you there.

Ralph

On Tue, Aug 11, 2009 at 8:54 AM, Klymak Jody <jklymak_at_[hidden]> wrote:

>
> On 11-Aug-09, at 6:16 AM, Jeff Squyres wrote:
>
> This means that OMPI is finding an mca_iof_proxy.la file at run time from
>> a prior version of Open MPI. You might want to use "find" or "locate" to
>> search your nodes and find it. I suspect that you somehow have an OMPI
>> 1.3.x install that overlaid an install of a prior OMPI version installation.
>>
>
>
> OK, right you were - the old file was in my new install directory. I
> didn't erase /usr/local/openmpi before re-running the install...
>
> However, after reinstalling on the nodes (but not cleaning out /usr/lib on
> all the nodes) I still have the following:
>
> Thanks, Jody
>
>
> saturna.cluster:17660] mca:base:select:( plm) Querying component [rsh]
> [saturna.cluster:17660] mca:base:select:( plm) Query of component [rsh]
> set priority to 10
> [saturna.cluster:17660] mca:base:select:( plm) Querying component [slurm]
> [saturna.cluster:17660] mca:base:select:( plm) Skipping component [slurm].
> Query failed to return a module
> [saturna.cluster:17660] mca:base:select:( plm) Querying component [tm]
> [saturna.cluster:17660] mca:base:select:( plm) Skipping component [tm].
> Query failed to return a module
> [saturna.cluster:17660] mca:base:select:( plm) Querying component [xgrid]
> [saturna.cluster:17660] mca:base:select:( plm) Skipping component [xgrid].
> Query failed to return a module
> [saturna.cluster:17660] mca:base:select:( plm) Selected component [rsh]
> [saturna.cluster:17660] plm:base:set_hnp_name: initial bias 17660 nodename
> hash 1656374957
> [saturna.cluster:17660] plm:base:set_hnp_name: final jobfam 24811
> [saturna.cluster:17660] [[24811,0],0] plm:base:receive start comm
> [saturna.cluster:17660] mca:base:select:( odls) Querying component
> [default]
> [saturna.cluster:17660] mca:base:select:( odls) Query of component
> [default] set priority to 1
> [saturna.cluster:17660] mca:base:select:( odls) Selected component
> [default]
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: setting up job [24811,1]
> [saturna.cluster:17660] [[24811,0],0] plm:base:setup_job for job [24811,1]
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: local shell: 0 (bash)
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: assuming same remote shell
> as local shell
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: remote shell: 0 (bash)
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: final template argv:
> /usr/bin/ssh <template> PATH=/usr/local/openmpi/bin:$PATH ; export
> PATH ; LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH ; export
> LD_LIBRARY_PATH ; /usr/local/openmpi/bin/orted --debug-daemons -mca ess env
> -mca orte_ess_jobid 1626013696 -mca orte_ess_vpid <template> -mca
> orte_ess_num_procs 3 --hnp-uri "1626013696.0;tcp://142.104.154.96:49710
> ;tcp://192.168.2.254:49710" -mca plm_base_verbose 5 -mca odls_base_verbose
> 5
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: launching on node xserve01
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: recording launch of daemon
> [[24811,0],1]
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: executing: (//usr/bin/ssh)
> [/usr/bin/ssh xserve01 PATH=/usr/local/openmpi/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH ; export
> LD_LIBRARY_PATH ; /usr/local/openmpi/bin/orted --debug-daemons -mca ess env
> -mca orte_ess_jobid 1626013696 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
> 3 --hnp-uri "1626013696.0;tcp://142.104.154.96:49710;tcp://
> 192.168.2.254:49710" -mca plm_base_verbose 5 -mca odls_base_verbose 5]
> Daemon was launched on xserve01.cluster - beginning to initialize
> [xserve01.cluster:42519] mca:base:select:( odls) Querying component
> [default]
> [xserve01.cluster:42519] mca:base:select:( odls) Query of component
> [default] set priority to 1
> [xserve01.cluster:42519] mca:base:select:( odls) Selected component
> [default]
> Daemon [[24811,0],1] checking in as pid 42519 on host xserve01.cluster
> Daemon [[24811,0],1] not using static ports
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: launching on node xserve02
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: recording launch of daemon
> [[24811,0],2]
> [saturna.cluster:17660] [[24811,0],0] plm:rsh: executing: (//usr/bin/ssh)
> [/usr/bin/ssh xserve02 PATH=/usr/local/openmpi/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH ; export
> LD_LIBRARY_PATH ; /usr/local/openmpi/bin/orted --debug-daemons -mca ess env
> -mca orte_ess_jobid 1626013696 -mca orte_ess_vpid 2 -mca orte_ess_num_procs
> 3 --hnp-uri "1626013696.0;tcp://142.104.154.96:49710;tcp://
> 192.168.2.254:49710" -mca plm_base_verbose 5 -mca odls_base_verbose 5]
> Daemon was launched on xserve02.local - beginning to initialize
> [xserve02.local:42180] mca:base:select:( odls) Querying component [default]
> [xserve02.local:42180] mca:base:select:( odls) Query of component [default]
> set priority to 1
> [xserve02.local:42180] mca:base:select:( odls) Selected component [default]
> Daemon [[24811,0],2] checking in as pid 42180 on host xserve02.local
> Daemon [[24811,0],2] not using static ports
> [saturna.cluster:17660] [[24811,0],0] plm:base:daemon_callback
> [saturna.cluster:17660] progressed_wait: base/plm_base_launch_support.c 459
> [saturna.cluster:17660] defining message event:
> base/plm_base_launch_support.c 423
> [saturna.cluster:17660] [[24811,0],0] plm:base:orted_report_launch from
> daemon [[24811,0],1]
> [saturna.cluster:17660] [[24811,0],0] plm:base:orted_report_launch
> completed for daemon [[24811,0],1]
> [saturna.cluster:17660] defining message event:
> base/plm_base_launch_support.c 423
> [saturna.cluster:17660] [[24811,0],0] plm:base:orted_report_launch from
> daemon [[24811,0],2]
> [xserve01.cluster:42519] [[24811,0],1] orted: up and running - waiting for
> commands!
> [saturna.cluster:17660] [[24811,0],0] plm:base:orted_report_launch
> completed for daemon [[24811,0],2]
> [saturna.cluster:17660] [[24811,0],0] plm:base:daemon_callback completed
> [saturna.cluster:17660] [[24811,0],0] plm:base:launch_apps for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] orted: up and running - waiting for
> commands!
> [saturna.cluster:17660] defining message event: grpcomm_bad_module.c 183
> [saturna.cluster:17660] [[24811,0],0] plm:base:report_launched for job
> [24811,1]
> [saturna.cluster:17660] progressed_wait: base/plm_base_launch_support.c 712
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [saturna.cluster:17660] [[24811,0],0] node[0].name saturna daemon 0 arch
> ffc90200
> [saturna.cluster:17660] [[24811,0],0] node[1].name xserve01 daemon 1 arch
> ffc90200
> [saturna.cluster:17660] [[24811,0],0] node[2].name xserve02 daemon 2 arch
> ffc90200
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received add_local_procs
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list
> [saturna.cluster:17660] [[24811,0],0] odls:construct_child_list unpacking
> data to launch job [24811,1]
> [saturna.cluster:17660] [[24811,0],0] odls:construct_child_list adding new
> jobdat for job [24811,1]
> [saturna.cluster:17660] [[24811,0],0] odls:construct_child_list unpacking 1
> app_contexts
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 0 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 1 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 2 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 3 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 4 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 5 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 6 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 7 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 8 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 9 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 10 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 11 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 12 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 13 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 14 on node 1 with daemon 1
> [saturna.cluster:17660] [[24811,0],0] odls:constructing child list -
> checking proc 15 on node 2 with daemon 2
> [saturna.cluster:17660] [[24811,0],0] odls:construct:child:
> num_participating 2
> [saturna.cluster:17660] [[24811,0],0] odls:launch found 4 processors for 0
> children and set oversubscribed to false
> [saturna.cluster:17660] [[24811,0],0] odls:launch reporting job [24811,1]
> launch status
> [saturna.cluster:17660] defining message event:
> base/odls_base_default_fns.c 1219
> [saturna.cluster:17660] [[24811,0],0] odls:launch setting waitpids
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay sending relay
> msg to 1
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay sending relay
> msg to 2
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch from
> daemon [[24811,0],0]
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch completed
> processing
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,0],0]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] node[0].name saturna daemon 0 arch
> ffc90200
> [xserve01.cluster:42519] [[24811,0],1] node[1].name xserve01 daemon 1 arch
> ffc90200
> [xserve01.cluster:42519] [[24811,0],1] node[2].name xserve02 daemon 2 arch
> ffc90200
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received add_local_procs
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list
> [xserve01.cluster:42519] [[24811,0],1] odls:construct_child_list unpacking
> data to launch job [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls:construct_child_list adding new
> jobdat for job [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls:construct_child_list unpacking
> 1 app_contexts
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,0],0]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [xserve02.local:42180] [[24811,0],2] node[0].name saturna daemon 0 arch
> ffc90200
> [xserve02.local:42180] [[24811,0],2] node[1].name xserve01 daemon 1 arch
> ffc90200
> [xserve02.local:42180] [[24811,0],2] node[2].name xserve02 daemon 2 arch
> ffc90200
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received add_local_procs
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list
> [xserve02.local:42180] [[24811,0],2] odls:construct_child_list unpacking
> data to launch job [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls:construct_child_list adding new
> jobdat for job [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 0 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 0 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 1 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 2 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 2 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 3 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 4 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 4 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 5 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 6 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 6 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 7 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 8 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 8 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 9 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 10 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 10 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 11 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 12 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 12 for me!
> [xserve02.local:42180] [[24811,0],2] odls:construct_child_list unpacking 1
> app_contexts
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 13 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 14 on node 1 with daemon 1
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list - found
> proc 14 for me!
> [xserve01.cluster:42519] [[24811,0],1] odls:constructing child list -
> checking proc 15 on node 2 with daemon 2
> [xserve01.cluster:42519] [[24811,0],1] odls:construct:child:
> num_participating 1
> [xserve01.cluster:42519] [[24811,0],1] odls:launch found 16 processors for
> 8 children and set oversubscribed to false
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 0 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 1 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 1 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 2 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 3 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 3 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 4 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 5 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 5 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 6 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 7 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 7 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 8 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 9 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 9 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 10 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 11 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 11 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 12 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 13 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 13 for me!
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 14 on node 1 with daemon 1
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list -
> checking proc 15 on node 2 with daemon 2
> [xserve02.local:42180] [[24811,0],2] odls:constructing child list - found
> proc 15 for me!
> [xserve02.local:42180] [[24811,0],2] odls:construct:child:
> num_participating 1
> [xserve02.local:42180] [[24811,0],2] odls:launch found 16 processors for 8
> children and set oversubscribed to false
> [xserve01.cluster:42519] [[24811,0],1] odls:launch reporting job [24811,1]
> launch status
> [saturna.cluster:17660] defining message event:
> base/plm_base_launch_support.c 668
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch reissuing
> non-blocking recv
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch from
> daemon [[24811,0],1]
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],0] from daemon [[24811,0],1]: pid 42523 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],2] from daemon [[24811,0],1]: pid 42524 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],4] from daemon [[24811,0],1]: pid 42525 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],6] from daemon [[24811,0],1]: pid 42526 state 2 exit 0
> [xserve01.cluster:42519] [[24811,0],1] odls:launch setting waitpids
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:send_relay
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:send_relay - recipient
> list is empty!
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],8] from daemon [[24811,0],1]: pid 42527 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],10] from daemon [[24811,0],1]: pid 42528 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],12] from daemon [[24811,0],1]: pid 42529 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],14] from daemon [[24811,0],1]: pid 42530 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch completed
> processing
> [xserve02.local:42180] [[24811,0],2] odls:launch reporting job [24811,1]
> launch status
> [saturna.cluster:17660] defining message event:
> base/plm_base_launch_support.c 668
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch reissuing
> non-blocking recv
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch from
> daemon [[24811,0],2]
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],1] from daemon [[24811,0],2]: pid 42184 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],3] from daemon [[24811,0],2]: pid 42185 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],5] from daemon [[24811,0],2]: pid 42186 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],7] from daemon [[24811,0],2]: pid 42187 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],9] from daemon [[24811,0],2]: pid 42188 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],11] from daemon [[24811,0],2]: pid 42189 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],13] from daemon [[24811,0],2]: pid 42190 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launched for proc
> [[24811,1],15] from daemon [[24811,0],2]: pid 42191 state 2 exit 0
> [saturna.cluster:17660] [[24811,0],0] plm:base:app_report_launch completed
> processing
> [saturna.cluster:17660] [[24811,0],0] plm:base:report_launched all apps
> reported
> [saturna.cluster:17660] [[24811,0],0] plm:base:launch wiring up iof
> [xserve02.local:42180] [[24811,0],2] odls:launch setting waitpids
> [xserve02.local:42180] [[24811,0],2] orte:daemon:send_relay
> [xserve02.local:42180] [[24811,0],2] orte:daemon:send_relay - recipient
> list is empty!
> [saturna.cluster:17660] [[24811,0],0] plm:base:launch completed for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],0]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],0] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],0]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],0]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],0] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],4]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],4] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],4]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],4]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],4] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],2]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],2] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],2]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],2]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],2] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],6]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],6] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],6]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],6]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],6] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],10]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],10] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],10]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],10]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],10] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],8]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],8] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],8]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],8]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],8] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],5]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],5] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],5]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],5]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],5] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],1]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],1] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],1]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],1]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],1] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],3]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],3] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],3]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],3]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],3] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],12]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],12] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],12]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],12]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],12] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],14]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],14] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_recv: received sync+nidmap
> from local proc [[24811,1],14]
> [xserve01.cluster:42519] [[24811,0],1] odls: registering sync on child
> [[24811,1],14]
> [xserve01.cluster:42519] [[24811,0],1] odls:sync nidmap requested for job
> [24811,1]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending sync ack to child
> [[24811,1],14] with 307 bytes of data
> [xserve01.cluster:42519] [[24811,0],1] odls: sending contact info to HNP
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [saturna.cluster:17660] defining message event: base/routed_base_receive.c
> 153
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],11]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],11] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],11]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],11]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],11] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],2]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],7]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],7] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],7]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],7]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],7] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],2] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],2]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],9]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],9] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],9]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],9]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],9] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],13]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],13] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],13]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],13]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],13] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],0]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],0] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],0]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],15]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],15] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_recv: received sync+nidmap from
> local proc [[24811,1],15]
> [xserve02.local:42180] [[24811,0],2] odls: registering sync on child
> [[24811,1],15]
> [xserve02.local:42180] [[24811,0],2] odls:sync nidmap requested for job
> [24811,1]
> [xserve02.local:42180] [[24811,0],2] odls: sending sync ack to child
> [[24811,1],15] with 307 bytes of data
> [xserve02.local:42180] [[24811,0],2] odls: sending contact info to HNP
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] defining message event: base/routed_base_receive.c
> 153
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],4]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],4] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],4]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],6]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],6] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],6]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],10]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],10] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],10]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],8]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],8] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],8]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],5]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],5] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],5]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],3]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],3] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],3]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],12]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],12] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],12]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],1]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],1] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],1]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: received message from
> [[24811,0],1]
> [saturna.cluster:17660] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],14]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],14] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],14]
> [xserve01.cluster:42519] [[24811,0],1] odls: executing collective
> [xserve01.cluster:42519] [[24811,0],1] odls: daemon collective called
> [xserve01.cluster:42519] [[24811,0],1] odls: daemon collective for job
> [24811,1] from [[24811,0],1] type 2 num_collected 1 num_participating 1
> num_contributors 8
> [xserve01.cluster:42519] [[24811,0],1] odls: daemon
> col[saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: reissued recv
> lective not the HNP - sending to parent [[24811,0],0]
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],1] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] odls: collective completed
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received collective data
> cmd
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective called
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective for job
> [24811,1] from [[24811,0],1] type 2 num_collected 1 num_participating 2
> num_contributors 8
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],9]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],9] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],9]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],13]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],13] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],13]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],7]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],7] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],7]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],11]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],11] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],11]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],15]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],15] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],15]
> [xserve02.local:42180] [[24811,0],2] odls: executing collective
> [xserve02.local:42180] [[24811,0],2] odls: daemon collective called
> [xserve02.local:42180] [[24811,0],2] odls: daemon collective for job
> [24811,1] from [[24811,0],2] type 2 num_collected 1 num_participating 1
> num_contributors 8
> [xserve02.local:42180] [[24811,0],2] odls: daemon collective not the HNP -
> sending to parent [[24811,0],0]
> [xserve02.local:42180] [[24811,0],2] odls: collective completed
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: received message from
> [[24811,0],2]
> [saturna.cluster:17660] defining message event: orted/orted_comm.c 159
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: reissued recv
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],2] for tag 1
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received collective data
> cmd
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective called
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective for job
> [24811,1] from [[24811,0],2] type 2 num_collected 2 num_participating 2
> num_contributors 16
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective HNP -
> xcasting to job [24811,1]
> [saturna.cluster:17660] defining message event: grpcomm_bad_module.c 183
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received
> message_local_procs
> [saturna.cluster:17660] [[24811,0],0] orted:comm:message_local_procs
> delivering message to job [24811,1] tag 15
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay sending relay
> msg to 1
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay sending relay
> msg to 2
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,0],0]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received
> message_local_procs
> [xserve02.local:42180] [[24811,0],2] orted:comm:message_local_procs
> delivering message to job [24811,1] tag 15
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],1]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],3]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],5]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],7]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],9]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],11]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 15 on
> child [[24811,1],13]
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,0],0]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received
> message_local_procs
> [xserve01.cluster:42519] [[24811,0],1] orted:comm:message_local_procs
> delivering message to job [24811,1] tag 15
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],0]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],2]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],4]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],6]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],8]
> [xserve01.cluster:42[xserve02.local:42180] [[24811,0],2] odls: sending
> message to tag 15 on child [[24811,1],15]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:send_relay
> [xserve02.local:42180] [[24811,0],2] orte:daemon:send_relay - recipient
> list is empty!
> 519] [[24811,0],1] odls: sending message to tag 15 on child [[24811,1],10]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],12]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 15 on
> child [[24811,1],14]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:send_relay
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:send_relay - recipient
> list is empty!
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],5]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],5] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],5]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],13]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],13] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],13]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],7]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],7] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],7]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],9]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],9] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],9]
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],12]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],12] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],12]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],10]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],10] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],10]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],11]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],11] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],11]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],15]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],15] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],15]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],4]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],4] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],4]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],6]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],6] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],6]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],8]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],8] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],8]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],14]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],14] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],14]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],1]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],1] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],1]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,1],3]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,1],3] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received collective data
> cmd
> [xserve02.local:42180] [[24811,0],2] odls: collecting data from child
> [[24811,1],3]
> [xserve02.local:42180] [[24811,0],2] odls: executing collective
> [xserve02.local:42180] [[24811,0],2] odls: daemon collective called
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: received message from
> [[24811,0],2]
> [xserve02.local:42180] [[24811,0],2] odls: daemon collective for job
> [24811,1] from [[24811,0],2] type 1 num_collected 1 num_participating 1
> num_contributors 8
> [xserve02.local:42180] [[24811,0],2] odls: daemon collective not the HNP -
> sending to parent [[24811,0],0]
> [xserve02.local:42180] [[24811,0],2] odls: collective completed
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] defining message event: orted/orted_comm.c 159
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: reissued recv
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: received message from
> [[24811,0],1]
> [saturna.cluster:17660] defining message event: orted/orted_comm.c 159
> [saturna.cluster:17660] [[24811,0],0] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],0]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],0] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],0]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,1],2]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,1],2] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received collective data
> cmd
> [xserve01.cluster:42519] [[24811,0],1] odls: collecting data from child
> [[24811,1],2]
> [xserve01.cluster:42519] [[24811,0],1] odls: executing collective
> [xserve01.cluster:42519] [[24811,0],1] odls: daemon collective called
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],2] for tag 1
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received collective data
> cmd
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective called
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective for job
> [24811,1] from [[24811,0],2] type 1 num_collected 1 num_participating 2
> num_contributors 8
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],1] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] odls: daemon collective for job
> [24811,1] from [[24811,0],1] type 1 num_collected 1 num_participating 1
> num_contributors 8
> [xserve01.cluster:42519] [[24811,0],1] odls: daemon collective not the HNP
> - sending to parent [[24811,0],0]
> [xserve01.cluster:42519] [[24811,0],1] odls: collective completed
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor:
> processing commands completed
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received collective data
> cmd
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective called
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective for job
> [24811,1] from [[24811,0],1] type 1 num_collected 2 num_participating 2
> num_contributors 16
> [saturna.cluster:17660] [[24811,0],0] odls: daemon collective HNP -
> xcasting to job [24811,1]
> [saturna.cluster:17660] defining message event: grpcomm_bad_module.c 183
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor: processing
> commands completed
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [saturna.cluster:17660] [[24811,0],0] orted_cmd: received
> message_local_procs
> [saturna.cluster:17660] [[24811,0],0] orted:comm:message_local_procs
> delivering message to job [24811,1] tag 17
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay sending relay
> msg to 1
> [saturna.cluster:17660] [[24811,0],0] orte:daemon:send_relay sending relay
> msg to 2
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: received message
> from [[24811,0],0]
> [xserve01.cluster:42519] defining message event: orted/orted_comm.c 159
> [xserve01.cluster:42519] [[24811,0],1] orted_recv_cmd: reissued recv
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [xserve01.cluster:42519] [[24811,0],1] orted_cmd: received
> message_local_procs
> [xserve01.cluster:42519] [[24811,0],1] orted:comm:message_local_procs
> delivering message to job [24811,1] tag 17
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],0]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],2]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],4]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],6]
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: received message from
> [[24811,0],0]
> [xserve02.local:42180] defining message event: orted/orted_comm.c 159
> [xserve02.local:42180] [[24811,0],2] orted_recv_cmd: reissued recv
> [xserve02.local:42180] [[24811,0],2] orte:daemon:cmd:processor called by
> [[24811,0],0] for tag 1
> [xserve02.local:42180] [[24811,0],2] orted_cmd: received
> message_local_procs
> [xserve02.local:42180] [[24811,0],2] orted:comm:message_local_procs
> delivering message to job [24811,1] tag 17
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],1]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],3]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],5]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],7]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],9]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],8]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],10]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],12]
> [xserve01.cluster:42519] [[24811,0],1] odls: sending message to tag 17 on
> child [[24811,1],14]
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:send_relay
> [xserve01.cluster:42519] [[24811,0],1] orte:daemon:send_relay - recipient
> list is empty!
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],11]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],13]
> [xserve02.local:42180] [[24811,0],2] odls: sending message to tag 17 on
> child [[24811,1],15]
> [xserve02.local:42180] [[24811,0],2] orte:daemon:send_relay
> [xserve02.local:42180] [[24811,0],2] orte:daemon:send_relay - recipient
> list is empty!
> [saturna.cluster:17660] defining message event: iof_hnp_receive.c 227
> [xserve02.local][[24811,1],1][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
> received unexpected process identifier [[24811,1],2]
> [saturna.cluster:17660] defining message event: iof_hnp_receive.c 227
> [xserve01.cluster][[24811,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
> received unexpected process identifier [[24811,1],5]
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>