[clus9:27962] mca: base: components_open: Looking for errmgr components [clus9:27962] mca: base: components_open: opening errmgr components [clus9:27962] mca: base: components_open: found loaded component app [clus9:27962] mca: base: components_open: component app has no register function [clus9:27962] mca: base: components_open: component app open function successful [clus9:27962] mca: base: components_open: found loaded component hnp [clus9:27962] mca: base: components_open: component hnp has no register function [clus9:27962] errmgr:hnp: open() [clus9:27962] errmgr:hnp: open: priority = 50 [clus9:27962] errmgr:hnp: open: verbosity = 0 [clus9:27962] errmgr:hnp: open: --- CR Migration Options --- [clus9:27962] errmgr:hnp: open: Process Migration = Disabled [clus9:27962] errmgr:hnp: open: timing = Disabled [clus9:27962] errmgr:hnp: open: --- Auto. Recovery Options --- [clus9:27962] errmgr:hnp: open: Auto. Recover = Disabled [clus9:27962] errmgr:hnp: open: timing = Disabled [clus9:27962] errmgr:hnp: open: recover_delay = 1 [clus9:27962] mca: base: components_open: component hnp open function successful [clus9:27962] mca: base: components_open: found loaded component orted [clus9:27962] mca: base: components_open: component orted has no register function [clus9:27962] mca: base: components_open: component orted open function successful [clus9:27962] mca:base:select: Auto-selecting errmgr components [clus9:27962] mca:base:select:(errmgr) Querying component [app] [clus9:27962] mca:base:select:(errmgr) Skipping component [app]. Query failed to return a module [clus9:27962] mca:base:select:(errmgr) Querying component [hnp] [clus9:27962] errmgr:hnp:component_query() [clus9:27962] mca:base:select:(errmgr) Query of component [hnp] set priority to 50 [clus9:27962] mca:base:select:(errmgr) Querying component [orted] [clus9:27962] mca:base:select:(errmgr) Skipping component [orted]. Query failed to return a module [clus9:27962] mca:base:select:(errmgr) Selected component [hnp] [clus9:27962] mca: base: close: component app closed [clus9:27962] mca: base: close: unloading component app [clus9:27962] mca: base: close: component orted closed [clus9:27962] mca: base: close: unloading component orted [clus9:27962] mca_oob_tcp_init: creating listen socket [clus9:27962] snapc:single: module_init: Global Snapshot Coordinator (disabled) [clus9:27962] [[56972,0],0] hostfile: checking hostfile ../hostfile for nodes [clus9:27962] [[56972,0],0] hostfile: filtering nodes through hostfile ../hostfile [clus9:27962] progressed_wait: ../../../../../orte/mca/plm/rsh/plm_rsh_module.c 1378 Daemon was launched on clus3 - beginning to initialize Daemon was launched on clus1 - beginning to initialize Daemon was launched on clus4 - beginning to initialize [clus3:04377] mca_oob_tcp_init: creating listen socket [clus3:04377] mca: base: components_open: Looking for errmgr components [clus3:04377] mca: base: components_open: opening errmgr components [clus3:04377] mca: base: components_open: found loaded component app [clus3:04377] mca: base: components_open: component app has no register function [clus3:04377] mca: base: components_open: component app open function successful [clus3:04377] mca: base: components_open: found loaded component hnp [clus3:04377] mca: base: components_open: component hnp has no register function [clus3:04377] errmgr:hnp: open() [clus3:04377] errmgr:hnp: open: priority = 50 [clus3:04377] errmgr:hnp: open: verbosity = 0 [clus3:04377] errmgr:hnp: open: --- CR Migration Options --- [clus3:04377] errmgr:hnp: open: Process Migration = Disabled [clus3:04377] errmgr:hnp: open: timing = Disabled [clus3:04377] errmgr:hnp: open: --- Auto. Recovery Options --- [clus3:04377] errmgr:hnp: open: Auto. Recover = Disabled [clus3:04377] errmgr:hnp: open: timing = Disabled [clus3:04377] errmgr:hnp: open: recover_delay = 1 [clus3:04377] mca: base: components_open: component hnp open function successful [clus3:04377] mca: base: components_open: found loaded component orted [clus3:04377] mca: base: components_open: component orted has no register function [clus3:04377] mca: base: components_open: component orted open function successful [clus3:04377] mca:base:select: Auto-selecting errmgr components [clus3:04377] mca:base:select:(errmgr) Querying component [app] [clus3:04377] mca:base:select:(errmgr) Skipping component [app]. Query failed to return a module [clus3:04377] mca:base:select:(errmgr) Querying component [hnp] [clus3:04377] errmgr:hnp:component_query() [clus3:04377] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [clus3:04377] mca:base:select:(errmgr) Querying component [orted] [clus3:04377] mca:base:select:(errmgr) Query of component [orted] set priority to 10 [clus3:04377] mca:base:select:(errmgr) Selected component [orted] [clus3:04377] mca: base: close: component app closed [clus3:04377] mca: base: close: unloading component app [clus3:04377] errmgr:hnp: close() [clus3:04377] mca: base: close: component hnp closed [clus3:04377] mca: base: close: unloading component hnp [clus1:15593] mca_oob_tcp_init: creating listen socket [clus1:15593] mca: base: components_open: Looking for errmgr components [clus1:15593] mca: base: components_open: opening errmgr components [clus1:15593] mca: base: components_open: found loaded component app [clus1:15593] mca: base: components_open: component app has no register function [clus1:15593] mca: base: components_open: component app open function successful [clus1:15593] mca: base: components_open: found loaded component hnp [clus1:15593] mca: base: components_open: component hnp has no register function [clus1:15593] errmgr:hnp: open() [clus1:15593] errmgr:hnp: open: priority = 50 [clus1:15593] errmgr:hnp: open: verbosity = 0 [clus1:15593] errmgr:hnp: open: --- CR Migration Options --- [clus1:15593] errmgr:hnp: open: Process Migration = Disabled [clus1:15593] errmgr:hnp: open: timing = Disabled [clus1:15593] errmgr:hnp: open: --- Auto. Recovery Options --- [clus1:15593] errmgr:hnp: open: Auto. Recover = Disabled [clus1:15593] errmgr:hnp: open: timing = Disabled [clus1:15593] errmgr:hnp: open: recover_delay = 1 [clus1:15593] mca: base: components_open: component hnp open function successful [clus1:15593] mca: base: components_open: found loaded component orted [clus1:15593] mca: base: components_open: component orted has no register function [clus1:15593] mca: base: components_open: component orted open function successful [clus1:15593] mca:base:select: Auto-selecting errmgr components [clus1:15593] mca:base:select:(errmgr) Querying component [app] [clus1:15593] mca:base:select:(errmgr) Skipping component [app]. Query failed to return a module [clus1:15593] mca:base:select:(errmgr) Querying component [hnp] [clus1:15593] errmgr:hnp:component_query() [clus1:15593] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [clus1:15593] mca:base:select:(errmgr) Querying component [orted] [clus1:15593] mca:base:select:(errmgr) Query of component [orted] set priority to 10 [clus1:15593] mca:base:select:(errmgr) Selected component [orted] [clus1:15593] mca: base: close: component app closed [clus1:15593] mca: base: close: unloading component app [clus1:15593] errmgr:hnp: close() [clus1:15593] mca: base: close: component hnp closed [clus1:15593] mca: base: close: unloading component hnp [clus4:15362] mca_oob_tcp_init: creating listen socket [clus4:15362] mca: base: components_open: Looking for errmgr components [clus4:15362] mca: base: components_open: opening errmgr components [clus4:15362] mca: base: components_open: found loaded component app [clus4:15362] mca: base: components_open: component app has no register function [clus4:15362] mca: base: components_open: component app open function successful [clus4:15362] mca: base: components_open: found loaded component hnp [clus4:15362] mca: base: components_open: component hnp has no register function [clus4:15362] errmgr:hnp: open() [clus4:15362] errmgr:hnp: open: priority = 50 [clus4:15362] errmgr:hnp: open: verbosity = 0 [clus4:15362] errmgr:hnp: open: --- CR Migration Options --- [clus4:15362] errmgr:hnp: open: Process Migration = Disabled [clus4:15362] errmgr:hnp: open: timing = Disabled [clus4:15362] errmgr:hnp: open: --- Auto. Recovery Options --- [clus4:15362] errmgr:hnp: open: Auto. Recover = Disabled [clus4:15362] errmgr:hnp: open: timing = Disabled [clus4:15362] errmgr:hnp: open: recover_delay = 1 [clus4:15362] mca: base: components_open: component hnp open function successful [clus4:15362] mca: base: components_open: found loaded component orted [clus4:15362] mca: base: components_open: component orted has no register function [clus4:15362] mca: base: components_open: component orted open function successful [clus4:15362] mca:base:select: Auto-selecting errmgr components [clus4:15362] mca:base:select:(errmgr) Querying component [app] [clus4:15362] mca:base:select:(errmgr) Skipping component [app]. Query failed to return a module [clus4:15362] mca:base:select:(errmgr) Querying component [hnp] [clus4:15362] errmgr:hnp:component_query() [clus4:15362] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [clus4:15362] mca:base:select:(errmgr) Querying component [orted] [clus4:15362] mca:base:select:(errmgr) Query of component [orted] set priority to 10 [clus4:15362] mca:base:select:(errmgr) Selected component [orted] [clus4:15362] mca: base: close: component app closed [clus4:15362] mca: base: close: unloading component app [clus4:15362] errmgr:hnp: close() [clus4:15362] mca: base: close: component hnp closed [clus4:15362] mca: base: close: unloading component hnp [clus1:15593] snapc:single: module_init: Local Snapshot Coordinator (disabled) [clus9:27962] defining message event: ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c 164 [clus9:27962] progressed_wait: ../../../../orte/mca/plm/base/plm_base_launch_support.c 357 [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus9:27962] [[56972,0],0] orte:daemon:send_relay [clus9:27962] [[56972,0],0] orte:daemon:send_relay sending relay msg to 1 [clus9:27962] [[56972,0],0] orte:daemon:send_relay sending relay msg to 2 [clus9:27962] [[56972,0],0] orte:daemon:send_relay sending relay msg to 3 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_ADD_LOCAL_PROCS [clus9:27962] [[56972,0],0] orted_cmd: received add_local_procs [clus3:04377] snapc:single: module_init: Local Snapshot Coordinator (disabled) Daemon [[56972,0],2] checking in as pid 4377 on host clus3 [clus3:04377] [[56972,0],2] orted: up and running - waiting for commands! [clus3:04377] [[56972,0],2] orted_recv_cmd: received message from [[56972,0],0] [clus3:04377] defining message event: ../../orte/orted/orted_comm.c 173 [clus3:04377] [[56972,0],2] orted_recv_cmd: reissued recv [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus3:04377] [[56972,0],2] node[0].name clus9 daemon 0 [clus3:04377] [[56972,0],2] node[1].name node1 daemon 1 [clus3:04377] [[56972,0],2] node[2].name node3 daemon 2 [clus3:04377] [[56972,0],2] node[3].name node4 daemon 3 [clus3:04377] [[56972,0],2] orte:daemon:send_relay [clus3:04377] [[56972,0],2] orte:daemon:send_relay - recipient list is empty! [clus3:04377] [[56972,0],2] orted:comm:process_commands() Processing Command: ORTE_DAEMON_ADD_LOCAL_PROCS [clus3:04377] [[56972,0],2] orted_cmd: received add_local_procs Daemon [[56972,0],1] checking in as pid 15593 on host clus1 [clus1:15593] [[56972,0],1] orted: up and running - waiting for commands! [clus1:15593] [[56972,0],1] orted_recv_cmd: received message from [[56972,0],0] [clus1:15593] defining message event: ../../orte/orted/orted_comm.c 173 [clus1:15593] [[56972,0],1] orted_recv_cmd: reissued recv [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus1:15593] [[56972,0],1] node[0].name clus9 daemon 0 [clus1:15593] [[56972,0],1] node[1].name node1 daemon 1 [clus1:15593] [[56972,0],1] node[2].name node3 daemon 2 [clus1:15593] [[56972,0],1] node[3].name node4 daemon 3 [clus1:15593] [[56972,0],1] orte:daemon:send_relay [clus1:15593] [[56972,0],1] orte:daemon:send_relay - recipient list is empty! [clus1:15593] [[56972,0],1] orted:comm:process_commands() Processing Command: ORTE_DAEMON_ADD_LOCAL_PROCS [clus1:15593] [[56972,0],1] orted_cmd: received add_local_procs [clus1:15593] [[56972,0],1] errmgr:orted got state LAUNCHED for proc [[56972,1],1] pid 15594 [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],0] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state LAUNCHED for proc [[56972,1],0] state LAUNCHED pid 27966 exit_code 0 [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process NULL [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state RUNNING for proc NULL state UNDEFINED pid 0 exit_code 1 [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state RUNNING [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],2] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state UNDEFINED for proc [[56972,1],2] state RUNNING pid 4378 exit_code 0 [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],1] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state UNDEFINED for proc [[56972,1],1] state RUNNING pid 15594 exit_code 0 [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],3] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state UNDEFINED for proc [[56972,1],3] state RUNNING pid 15363 exit_code 0 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus3:04377] [[56972,0],2] errmgr:orted got state LAUNCHED for proc [[56972,1],2] pid 4378 [clus4:15362] [[56972,0],3] errmgr:orted got state LAUNCHED for proc [[56972,1],3] pid 15363 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,1]:[clus1:15594] mca: base: components_open: Looking for errmgr components [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,2]:[clus3:04378] mca: base: components_open: Looking for errmgr components [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,3]:[clus4:15363] mca: base: components_open: Looking for errmgr components [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,1]:[clus1:15594] mca: base: components_open: opening errmgr components [1,1]:[clus1:15594] mca: base: components_open: found loaded component app [1,1]:[clus1:15594] mca: base: components_open: component app has no register function [1,1]:[clus1:15594] mca: base: components_open: component app open function successful [1,1]:[clus1:15594] mca: base: components_open: found loaded component hnp [1,1]:[clus1:15594] mca: base: components_open: component hnp has no register function [1,1]:[clus1:15594] errmgr:hnp: open() [1,1]:[clus1:15594] errmgr:hnp: open: priority = 50 [1,1]:[clus1:15594] errmgr:hnp: open: verbosity = 0 [1,1]:[clus1:15594] errmgr:hnp: open: --- CR Migration Options --- [1,1]:[clus1:15594] errmgr:hnp: open: Process Migration = Disabled [1,1]:[clus1:15594] errmgr:hnp: open: timing = Disabled [1,1]:[clus1:15594] errmgr:hnp: open: --- Auto. Recovery Options --- [1,1]:[clus1:15594] errmgr:hnp: open: Auto. Recover = Disabled [1,1]:[clus1:15594] errmgr:hnp: open: timing = Disabled [1,1]:[clus1:15594] errmgr:hnp: open: recover_delay = 1 [1,1]:[clus1:15594] mca: base: components_open: component hnp open function successful [1,1]:[clus1:15594] mca: base: components_open: found loaded component orted [1,1]:[clus1:15594] mca: base: components_open: component orted has no register function [1,1]:[clus1:15594] mca: base: components_open: component orted open function successful [1,1]:[clus1:15594] mca:base:select: Auto-selecting errmgr components [1,1]:[clus1:15594] mca:base:select:(errmgr) Querying component [app] [1,1]:[clus1:15594] mca:base:select:(errmgr) Query of component [app] set priority to 10 [1,1]:[clus1:15594] mca:base:select:(errmgr) Querying component [hnp] [1,1]:[clus1:15594] errmgr:hnp:component_query() [1,1]:[clus1:15594] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [1,1]:[clus1:15594] mca:base:select:(errmgr) Querying component [orted] [1,1]:[clus1:15594] mca:base:select:(errmgr) Skipping component [orted]. Query failed to return a module [1,1]:[clus1:15594] mca:base:select:(errmgr) Selected component [app] [1,1]:[clus1:15594] errmgr:hnp: close() [1,1]:[clus1:15594] mca: base: close: component hnp closed [1,1]:[clus1:15594] mca: base: close: unloading component hnp [1,1]:[clus1:15594] mca: base: close: component orted closed [1,1]:[clus1:15594] mca: base: close: unloading component orted [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,2]:[clus3:04378] mca: base: components_open: opening errmgr components [1,2]:[clus3:04378] mca: base: components_open: found loaded component app [1,2]:[clus3:04378] mca: base: components_open: component app has no register function [1,2]:[clus3:04378] mca: base: components_open: component app open function successful [1,2]:[clus3:04378] mca: base: components_open: found loaded component hnp [1,2]:[clus3:04378] mca: base: components_open: component hnp has no register function [1,2]:[clus3:04378] errmgr:hnp: open() [1,2]:[clus3:04378] errmgr:hnp: open: priority = 50 [1,2]:[clus3:04378] errmgr:hnp: open: verbosity = 0 [1,2]:[clus3:04378] errmgr:hnp: open: --- CR Migration Options --- [1,2]:[clus3:04378] errmgr:hnp: open: Process Migration = Disabled [1,2]:[clus3:04378] errmgr:hnp: open: timing = Disabled [1,2]:[clus3:04378] errmgr:hnp: open: --- Auto. Recovery Options --- [1,2]:[clus3:04378] errmgr:hnp: open: Auto. Recover = Disabled [1,2]:[clus3:04378] errmgr:hnp: open: timing = Disabled [1,2]:[clus3:04378] errmgr:hnp: open: recover_delay = 1 [1,2]:[clus3:04378] mca: base: components_open: component hnp open function successful [1,2]:[clus3:04378] mca: base: components_open: found loaded component orted [1,2]:[clus3:04378] mca: base: components_open: component orted has no register function [1,2]:[clus3:04378] mca: base: components_open: component orted open function successful [1,2]:[clus3:04378] mca:base:select: Auto-selecting errmgr components [1,2]:[clus3:04378] mca:base:select:(errmgr) Querying component [app] [1,2]:[clus3:04378] mca:base:select:(errmgr) Query of component [app] set priority to 10 [1,2]:[clus3:04378] mca:base:select:(errmgr) Querying component [hnp] [1,2]:[clus3:04378] errmgr:hnp:component_query() [1,2]:[clus3:04378] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [1,2]:[clus3:04378] mca:base:select:(errmgr) Querying component [orted] [1,2]:[clus3:04378] mca:base:select:(errmgr) Skipping component [orted]. Query failed to return a module [1,2]:[clus3:04378] mca:base:select:(errmgr) Selected component [app] [1,2]:[clus3:04378] errmgr:hnp: close() [1,2]:[clus3:04378] mca: base: close: component hnp closed [1,2]:[clus3:04378] mca: base: close: unloading component hnp [1,2]:[clus3:04378] mca: base: close: component orted closed [1,2]:[clus3:04378] mca: base: close: unloading component orted [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,3]:[clus4:15363] mca: base: components_open: opening errmgr components [1,3]:[clus4:15363] mca: base: components_open: found loaded component app [1,3]:[clus4:15363] mca: base: components_open: component app has no register function [1,3]:[clus4:15363] mca: base: components_open: component app open function successful [1,3]:[clus4:15363] mca: base: components_open: found loaded component hnp [1,3]:[clus4:15363] mca: base: components_open: component hnp has no register function [1,3]:[clus4:15363] errmgr:hnp: open() [1,3]:[clus4:15363] errmgr:hnp: open: priority = 50 [1,3]:[clus4:15363] errmgr:hnp: open: verbosity = 0 [1,3]:[clus4:15363] errmgr:hnp: open: --- CR Migration Options --- [1,3]:[clus4:15363] errmgr:hnp: open: Process Migration = Disabled [1,3]:[clus4:15363] errmgr:hnp: open: timing = Disabled [1,3]:[clus4:15363] errmgr:hnp: open: --- Auto. Recovery Options --- [1,3]:[clus4:15363] errmgr:hnp: open: Auto. Recover = Disabled [1,3]:[clus4:15363] errmgr:hnp: open: timing = Disabled [1,3]:[clus4:15363] errmgr:hnp: open: recover_delay = 1 [1,3]:[clus4:15363] mca: base: components_open: component hnp open function successful [1,3]:[clus4:15363] mca: base: components_open: found loaded component orted [1,3]:[clus4:15363] mca: base: components_open: component orted has no register function [1,3]:[clus4:15363] mca: base: components_open: component orted open function successful [1,3]:[clus4:15363] mca:base:select: Auto-selecting errmgr components [1,3]:[clus4:15363] mca:base:select:(errmgr) Querying component [app] [1,3]:[clus4:15363] mca:base:select:(errmgr) Query of component [app] set priority to 10 [1,3]:[clus4:15363] mca:base:select:(errmgr) Querying component [hnp] [1,3]:[clus4:15363] errmgr:hnp:component_query() [1,3]:[clus4:15363] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [1,3]:[clus4:15363] mca:base:select:(errmgr) Querying component [orted] [1,3]:[clus4:15363] mca:base:select:(errmgr) Skipping component [orted]. Query failed to return a module [1,3]:[clus4:15363] mca:base:select:(errmgr) Selected component [app] [1,3]:[clus4:15363] errmgr:hnp: close() [1,3]:[clus4:15363] mca: base: close: component hnp closed [1,3]:[clus4:15363] mca: base: close: unloading component hnp [1,3]:[clus4:15363] mca: base: close: component orted closed [1,3]:[clus4:15363] mca: base: close: unloading component orted [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,1]:[clus1:15594] mca_oob_tcp_init: creating listen socket [clus1:15593] [[56972,0],1] orted_recv_cmd: received message from [[56972,1],1] [clus1:15593] defining message event: ../../orte/orted/orted_comm.c 173 [clus1:15593] [[56972,0],1] orted_recv_cmd: reissued recv [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor called by [[56972,1],1] for tag 1 [clus1:15593] [[56972,0],1] orted:comm:process_commands() Processing Command: ORTE_DAEMON_SYNC_WANT_NIDMAP [clus1:15593] [[56972,0],1] orted_recv: received sync+nidmap from local proc [[56972,1],1] [clus1:15593] [[56972,0],1] errmgr:orted got state SYNC REGISTERED for proc [[56972,1],1] pid 0 [clus1:15593] [[56972,0],1] errmgr:orted: sending contact info to HNP [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor: processing commands completed [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],1] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state SYNC REGISTERED for proc [[56972,1],1] state SYNC REGISTERED pid 0 exit_code 1 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,2]:[clus3:04378] mca_oob_tcp_init: creating listen socket [clus3:04377] [[56972,0],2] orted_recv_cmd: received message from [[56972,1],2] [clus3:04377] defining message event: ../../orte/orted/orted_comm.c 173 [clus3:04377] [[56972,0],2] orted_recv_cmd: reissued recv [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor called by [[56972,1],2] for tag 1 [clus3:04377] [[56972,0],2] orted:comm:process_commands() Processing Command: ORTE_DAEMON_SYNC_WANT_NIDMAP [clus3:04377] [[56972,0],2] orted_recv: received sync+nidmap from local proc [[56972,1],2] [clus3:04377] [[56972,0],2] errmgr:orted got state SYNC REGISTERED for proc [[56972,1],2] pid 0 [clus3:04377] [[56972,0],2] errmgr:orted: sending contact info to HNP [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],2] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state SYNC REGISTERED for proc [[56972,1],2] state SYNC REGISTERED pid 0 exit_code 1 [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor: processing commands completed [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,1]:[clus1:15594] snapc:single: module_init: Application Snapshot Coordinator [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,3]:[clus4:15363] mca_oob_tcp_init: creating listen socket [clus4:15362] [[56972,0],3] orted_recv_cmd: received message from [[56972,1],3] [clus4:15362] defining message event: ../../orte/orted/orted_comm.c 173 [clus4:15362] [[56972,0],3] orted_recv_cmd: reissued recv [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor called by [[56972,1],3] for tag 1 [clus4:15362] [[56972,0],3] orted:comm:process_commands() Processing Command: ORTE_DAEMON_SYNC_WANT_NIDMAP [clus4:15362] [[56972,0],3] orted_recv: received sync+nidmap from local proc [[56972,1],3] [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],3] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state SYNC REGISTERED for proc [[56972,1],3] state SYNC REGISTERED pid 0 exit_code 1 [clus4:15362] [[56972,0],3] errmgr:orted got state SYNC REGISTERED for proc [[56972,1],3] pid 0 [clus4:15362] [[56972,0],3] errmgr:orted: sending contact info to HNP [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor: processing commands completed [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,2]:[clus3:04378] snapc:single: module_init: Application Snapshot Coordinator [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,3]:[clus4:15363] snapc:single: module_init: Application Snapshot Coordinator [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,1]:[clus1:15594] pml_v: loaded [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,2]:[clus3:04378] pml_v: loaded [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,3]:[clus4:15363] pml_v: loaded [clus1:15593] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [clus9:27962] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [1,0]:[clus9:27966] mca: base: components_open: Looking for errmgr components [clus3:04377] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [clus9:27962] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [1,0]:[clus9:27966] mca: base: components_open: opening errmgr components [1,0]:[clus9:27966] mca: base: components_open: found loaded component app [1,0]:[clus9:27966] mca: base: components_open: component app has no register function [1,0]:[clus9:27966] mca: base: components_open: component app open function successful [1,0]:[clus9:27966] mca: base: components_open: found loaded component hnp [1,0]:[clus9:27966] mca: base: components_open: component hnp has no register function [1,0]:[clus9:27966] errmgr:hnp: open() [1,0]:[clus9:27966] errmgr:hnp: open: priority = 50 [1,0]:[clus9:27966] errmgr:hnp: open: verbosity = 0 [1,0]:[clus9:27966] errmgr:hnp: open: --- CR Migration Options --- [1,0]:[clus9:27966] errmgr:hnp: open: Process Migration = Disabled [1,0]:[clus9:27966] errmgr:hnp: open: timing = Disabled [1,0]:[clus9:27966] errmgr:hnp: open: --- Auto. Recovery Options --- [1,0]:[clus9:27966] errmgr:hnp: open: Auto. Recover = Disabled [1,0]:[clus9:27966] errmgr:hnp: open: timing = Disabled [1,0]:[clus9:27966] errmgr:hnp: open: recover_delay = 1 [1,0]:[clus9:27966] mca: base: components_open: component hnp open function successful [1,0]:[clus9:27966] mca: base: components_open: found loaded component orted [1,0]:[clus9:27966] mca: base: components_open: component orted has no register function [1,0]:[clus9:27966] mca: base: components_open: component orted open function successful [1,0]:[clus9:27966] mca:base:select: Auto-selecting errmgr components [1,0]:[clus9:27966] mca:base:select:(errmgr) Querying component [app] [1,0]:[clus9:27966] mca:base:select:(errmgr) Query of component [app] set priority to 10 [1,0]:[clus9:27966] mca:base:select:(errmgr) Querying component [hnp] [1,0]:[clus9:27966] errmgr:hnp:component_query() [1,0]:[clus9:27966] mca:base:select:(errmgr) Skipping component [hnp]. Query failed to return a module [1,0]:[clus9:27966] mca:base:select:(errmgr) Querying component [orted] [1,0]:[clus9:27966] mca:base:select:(errmgr) Skipping component [orted]. Query failed to return a module [1,0]:[clus9:27966] mca:base:select:(errmgr) Selected component [app] [1,0]:[clus9:27966] errmgr:hnp: close() [1,0]:[clus9:27966] mca: base: close: component hnp closed [1,0]:[clus9:27966] mca: base: close: unloading component hnp [1,0]:[clus9:27966] mca: base: close: component orted closed [1,0]:[clus9:27966] mca: base: close: unloading component orted [clus4:15362] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [clus9:27962] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [1,0]:[clus9:27966] mca_oob_tcp_init: creating listen socket [clus9:27962] [[56972,0],0] orted_recv_cmd: received message from [[56972,1],0] [clus9:27962] defining message event: ../../orte/orted/orted_comm.c 173 [clus9:27962] [[56972,0],0] orted_recv_cmd: reissued recv [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,1],0] for tag 1 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_SYNC_WANT_NIDMAP [clus9:27962] [[56972,0],0] orted_recv: received sync+nidmap from local proc [[56972,1],0] [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],0] [clus9:27962] [[56972,0],0] errmgr:hnp: job [INVALID] reported state UNDEFINED for proc [[56972,1],0] state SYNC REGISTERED pid 0 exit_code 0 [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor: processing commands completed [1,0]:[clus9:27966] snapc:single: module_init: Application Snapshot Coordinator [1,0]:[clus9:27966] pml_v: loaded [clus9:27962] defining message event: ../../../../orte/mca/grpcomm/base/grpcomm_base_coll.c 898 [clus9:27962] defining message event: ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c 164 [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus9:27962] [[56972,0],0] orte:daemon:send_relay [clus9:27962] [[56972,0],0] orte:daemon:send_relay sending relay msg to 1 [clus9:27962] [[56972,0],0] orte:daemon:send_relay sending relay msg to 2 [clus9:27962] [[56972,0],0] orte:daemon:send_relay sending relay msg to 3 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_MESSAGE_LOCAL_PROCS [clus9:27962] [[56972,0],0] orted_cmd: received message_local_procs [clus9:27962] [[56972,0],0] orted:comm:message_local_procs delivering message to job [56972,1] tag 15 [clus1:15593] [[56972,0],1] orted_recv_cmd: received message from [[56972,0],0] [clus1:15593] defining message event: ../../orte/orted/orted_comm.c 173 [clus3:04377] [[56972,0],2] orted_recv_cmd: received message from [[56972,0],0] [clus1:15593] [[56972,0],1] orted_recv_cmd: reissued recv [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus1:15593] [[56972,0],1] orte:daemon:send_relay [clus1:15593] [[56972,0],1] orte:daemon:send_relay - recipient list is empty! [clus1:15593] [[56972,0],1] orted:comm:process_commands() Processing Command: ORTE_DAEMON_MESSAGE_LOCAL_PROCS [clus1:15593] [[56972,0],1] orted_cmd: received message_local_procs [clus1:15593] [[56972,0],1] orted:comm:message_local_procs delivering message to job [56972,1] tag 15 [clus3:04377] defining message event: ../../orte/orted/orted_comm.c 173 [clus3:04377] [[56972,0],2] orted_recv_cmd: reissued recv [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus3:04377] [[56972,0],2] orte:daemon:send_relay [clus3:04377] [[56972,0],2] orte:daemon:send_relay - recipient list is empty! [clus3:04377] [[56972,0],2] orted:comm:process_commands() Processing Command: ORTE_DAEMON_MESSAGE_LOCAL_PROCS [clus3:04377] [[56972,0],2] orted_cmd: received message_local_procs [clus3:04377] [[56972,0],2] orted:comm:message_local_procs delivering message to job [56972,1] tag 15 [clus1:15593] [[56972,0],1] orted_recv_cmd: received message from [[56972,1],1] [clus1:15593] defining message event: ../../orte/orted/orted_comm.c 173 [clus1:15593] [[56972,0],1] orted_recv_cmd: reissued recv [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor called by [[56972,1],1] for tag 1 [clus1:15593] [[56972,0],1] orted:comm:process_commands() Processing Command: Unknown Command! [clus1:15593] [[56972,0],1] orted_recv: received request protector from local proc [[56972,1],1] [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor: processing commands completed [clus3:04377] [[56972,0],2] orted_recv_cmd: received message from [[56972,1],2] [clus3:04377] defining message event: ../../orte/orted/orted_comm.c 173 [clus3:04377] [[56972,0],2] orted_recv_cmd: reissued recv [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor called by [[56972,1],2] for tag 1 [clus3:04377] [[56972,0],2] orted:comm:process_commands() Processing Command: Unknown Command! [clus3:04377] [[56972,0],2] orted_recv: received request protector from local proc [[56972,1],2] [clus4:15362] [[56972,0],3] orted_recv_cmd: received message from [[56972,0],0] [clus4:15362] defining message event: ../../orte/orted/orted_comm.c 173 [clus4:15362] [[56972,0],3] orted_recv_cmd: reissued recv [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus4:15362] [[56972,0],3] orte:daemon:send_relay [clus4:15362] [[56972,0],3] orte:daemon:send_relay - recipient list is empty! [clus4:15362] [[56972,0],3] orted:comm:process_commands() Processing Command: ORTE_DAEMON_MESSAGE_LOCAL_PROCS [clus4:15362] [[56972,0],3] orted_cmd: received message_local_procs [clus4:15362] [[56972,0],3] orted:comm:message_local_procs delivering message to job [56972,1] tag 15 [clus4:15362] [[56972,0],3] orted_recv_cmd: received message from [[56972,1],3] [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor: processing commands completed [clus4:15362] defining message event: ../../orte/orted/orted_comm.c 173 [clus4:15362] [[56972,0],3] orted_recv_cmd: reissued recv [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor called by [[56972,1],3] for tag 1 [clus4:15362] [[56972,0],3] orted:comm:process_commands() Processing Command: Unknown Command! [clus4:15362] [[56972,0],3] orted_recv: received request protector from local proc [[56972,1],3] [clus1:15593] [[56972,0],1]-[[56972,1],1] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus1:15593] defining message event: ../../../../orte/mca/odls/base/odls_base_default_fns.c 2710 [clus1:15593] defining message event: ../../../../../orte/mca/iof/orted/iof_orted_read.c 218 [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor called by [[56972,0],1] for tag 1 [clus9:27962] [[56972,0],0] orted_recv_cmd: received message from [[56972,1],0] [clus9:27962] defining message event: ../../orte/orted/orted_comm.c 173 [clus9:27962] [[56972,0],0] orted_recv_cmd: reissued recv [clus1:15593] [[56972,0],1] orted:comm:process_commands() Processing Command: ORTE_DAEMON_WAITPID_FIRED [clus1:15593] [[56972,0],1] orted_cmd: received waitpid_fired cmd [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor: processing commands completed [clus1:15593] [[56972,0],1] orte:daemon:cmd:processor called by [[56972,0],1] for tag 1 [clus1:15593] [[56972,0],1] orted:comm:process_commands() Processing Command: ORTE_DAEMON_IOF_COMPLETE [clus1:15593] [[56972,0],1] orted_cmd: received iof_complete cmd [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [clus1:15593] [[56972,0],1] errmgr:orted got state TERMINATED WITHOUT SYNC for proc [[56972,1],1] pid 15594 [clus3:04377] [[56972,0],2]-[[56972,1],2] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_receive.c 228 [1,0]:[clus9:27966] pml_v: vprotocol select: initializing vprotocol component receiver [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,1],0] for tag 1 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: Unknown Command! [clus1:15593] [[56972,0],1] errmgr:orted RADIC enabled, ignorando abort del proc [[56972,1],1] (OK, let's restart it) [clus1:15593] *** Process received signal *** [clus1:15593] Signal: Segmentation fault (11) [clus1:15593] Signal code: (128) [clus1:15593] Failing at address: (nil) [clus3:04377] defining message event: ../../../../../orte/mca/iof/orted/iof_orted_read.c 218 [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor called by [[56972,0],2] for tag 1 [clus3:04377] [[56972,0],2] orted:comm:process_commands() Processing Command: ORTE_DAEMON_WAITPID_FIRED [clus3:04377] [[56972,0],2] orted_cmd: received waitpid_fired cmd [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor: processing commands completed [clus3:04377] [[56972,0],2] orte:daemon:cmd:processor called by [[56972,0],2] for tag 1 [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor: processing commands completed [clus3:04377] [[56972,0],2] orted:comm:process_commands() Processing Command: ORTE_DAEMON_IOF_COMPLETE [clus3:04377] [[56972,0],2] orted_cmd: received iof_complete cmd [1,2]:[clus3:04378] pml_v: vprotocol select: initializing vprotocol component receiver [1,3]:[clus4:15363] pml_v: vprotocol select: initializing vprotocol component receiver [1,1]:[clus1:15594] pml_v: vprotocol select: initializing vprotocol component receiver [clus3:04377] [[56972,0],2] errmgr:orted got state TERMINATED WITHOUT SYNC for proc [[56972,1],2] pid 4378 [clus3:04377] [[56972,0],2] errmgr:orted RADIC enabled, ignorando abort del proc [[56972,1],2] (OK, let's restart it) [clus3:04377] *** Process received signal *** [clus3:04377] Signal: Segmentation fault (11) [clus3:04377] Signal code: (128) [clus3:04377] Failing at address: (nil) [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor: processing commands completed [clus4:15362] [[56972,0],3]-[[56972,1],3] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus4:15362] defining message event: ../../../../orte/mca/odls/base/odls_base_default_fns.c 2710 [clus4:15362] defining message event: ../../../../../orte/mca/iof/orted/iof_orted_read.c 218 [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor called by [[56972,0],3] for tag 1 [clus4:15362] [[56972,0],3] orted:comm:process_commands() Processing Command: ORTE_DAEMON_WAITPID_FIRED [clus4:15362] [[56972,0],3] orted_cmd: received waitpid_fired cmd [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor: processing commands completed [clus4:15362] [[56972,0],3] orte:daemon:cmd:processor called by [[56972,0],3] for tag 1 [clus4:15362] [[56972,0],3] orted:comm:process_commands() Processing Command: ORTE_DAEMON_IOF_COMPLETE [clus4:15362] [[56972,0],3] orted_cmd: received iof_complete cmd [clus4:15362] [[56972,0],3] errmgr:orted got state TERMINATED WITHOUT SYNC for proc [[56972,1],3] pid 15363 [clus4:15362] [[56972,0],3] errmgr:orted RADIC enabled, ignorando abort del proc [[56972,1],3] (OK, let's restart it) [clus4:15362] *** Process received signal *** [clus4:15362] Signal: Segmentation fault (11) [clus4:15362] Signal code: (128) [clus4:15362] Failing at address: (nil) [clus9:27962] [[56972,0],0]-[[56972,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],0] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,1] reported state COMMUNICATION FAILURE for proc [[56972,1],0] state COMMUNICATION FAILURE pid 0 exit_code 1 [clus9:27962] defining message event: ../../../../orte/mca/odls/base/odls_base_default_fns.c 2710 [clus9:27962] defining message event: ../../../../../orte/mca/iof/hnp/iof_hnp_read.c 292 [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_WAITPID_FIRED [clus9:27962] [[56972,0],0] orted_cmd: received waitpid_fired cmd [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor: processing commands completed [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_IOF_COMPLETE [clus9:27962] [[56972,0],0] orted_cmd: received iof_complete cmd [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- App. Process state updated for process [[56972,1],0] [clus9:27962] [[56972,0],0] errmgr:hnp: job [INVALID] reported state UNDEFINED for proc [[56972,1],0] state TERMINATED WITHOUT SYNC pid 27966 exit_code 127 [clus9:27962] [[56972,0],0] errmgr:hnp:check_job_completed proc [[56972,1],0] terminated without sync [clus9:27962] [[56972,0],0]:../../../../../orte/mca/errmgr/hnp/errmgr_hnp.c(1100) updating exit status to 127 [clus9:27962] [[56972,0],0] errmgr:hnp:check_job_completed job [56972,1] is not terminated (1:4) [clus9:27962] [[56972,0],0] errmgr:hnp:check_job_completed at least one job is not terminated [clus9:27962] [[56972,0],0] errmgr:hnp: abort called on job [56972,1] with status 127 [clus9:27962] defining timeout: 0 sec 3000 usec at ../../../../orte/mca/plm/base/plm_base_orted_cmds.c:186 [clus9:27962] progressed_wait: ../../../../orte/mca/plm/base/plm_base_orted_cmds.c 189 [clus9:27962] defining message event: ../../../../orte/mca/plm/base/plm_base_orted_cmds.c 198 [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor: processing commands completed [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor called by [[56972,0],0] for tag 1 [clus9:27962] [[56972,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_EXIT_CMD [clus9:27962] [[56972,0],0] orted_cmd: received exit cmd [clus9:27962] [[56972,0],0] orte:daemon:cmd:processor: processing commands completed [clus1:15593] [ 0] /lib64/libpthread.so.0 [0x2aaaabb03d40] [clus1:15593] [ 1] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad760db] [clus1:15593] [ 2] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad75aa4] [clus1:15593] [ 3] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/openmpi/mca_errmgr_orted.so [0x2aaaae2d2fdd] [clus1:15593] [ 4] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_odls_base_notify_iof_complete+0x1da) [0x2aaaaad42cb0] [clus1:15593] [ 5] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_process_commands+0x1068) [0x2aaaaad19ca6] [clus1:15593] [ 6] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x81b) [0x2aaaaad18a55] [clus1:15593] [ 7] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad9710e] [clus1:15593] [ 8] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad974bb] [clus1:15593] [ 9] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_loop+0x1a) [0x2aaaaad972ad] [clus1:15593] [10] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe) [0x2aaaaad97166] [clus1:15593] [11] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322) [0x2aaaaad17556] [clus1:15593] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x4008a3] [clus1:15593] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaabd2d8a4] [clus1:15593] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x400799] [clus1:15593] *** End of error message *** [clus3:04377] [ 0] /lib64/libpthread.so.0 [0x2aaaabb03d40] [clus9:27962] [[56972,0],0]-[[56972,0],1] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus3:04377] [ 1] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad760db] [clus3:04377] [ 2] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad75aa4] [clus3:04377] [ 3] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/openmpi/mca_errmgr_orted.so [0x2aaaae2d2fdd] [clus3:04377] [ 4] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_odls_base_notify_iof_complete+0x1da) [0x2aaaaad42cb0] [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- Daemon state updated for process [[56972,0],1] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,0] reported state COMMUNICATION FAILURE for proc [[56972,0],1] state COMMUNICATION FAILURE pid 0 exit_code 1 [clus9:27962] [[56972,0],0] Daemons terminating - recording daemon [[56972,0],1] as gone [clus3:04377] [ 5] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_process_commands+0x1068) [0x2aaaaad19ca6] [clus3:04377] [ 6] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x81b) [0x2aaaaad18a55] [clus3:04377] [ 7] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad9710e] [clus3:04377] [ 8] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad974bb] [clus3:04377] [ 9] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_loop+0x1a) [0x2aaaaad972ad] bash: line 1: 15593 Segmentation fault /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted --debug-daemons -mca ess env -mca orte_ess_jobid 3733716992 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 --hnp-uri "3733716992.0;tcp://192.168.12.9:42503" -mca mca_base_param_file_prefix ../ft-radic -mca mca_base_param_file_path /home/hmeyer/desarrollo/radic-ompi/binarios/share/openmpi/amca-param-sets:/home/hmeyer/desarrollo/Pruebas/codes/coll -mca mca_base_param_file_path_force /home/hmeyer/desarrollo/Pruebas/codes/coll -mca plm rsh [clus3:04377] [10] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe) [0x2aaaaad97166] [clus3:04377] [11] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322) [0x2aaaaad17556] [clus3:04377] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x4008a3] [clus3:04377] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaabd2d8a4] [clus3:04377] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x400799] [clus4:15362] [ 0] /lib64/libpthread.so.0 [0x2aaaabb03d40] [clus9:27962] [[56972,0],0]-[[56972,0],2] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus4:15362] [10] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe) [0x2aaaaad97166] [clus4:15362] [11] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322) [0x2aaaaad17556] [clus4:15362] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x4008a3] [clus4:15362] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaabd2d8a4] [clus4:15362] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x400799] [clus4:15362] *** End of error message *** [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,0] reported state COMMUNICATION FAILURE for proc [[56972,0],2] state COMMUNICATION FAILURE pid 0 exit_code 1 [clus9:27962] [[56972,0],0] Daemons terminating - recording daemon [[56972,0],2] as gone bash: line 1: 15362 Segmentation fault /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted --debug-daemons -mca ess env -mca orte_ess_jobid 3733716992 -mca orte_ess_vpid 3 -mca orte_ess_num_procs 4 --hnp-uri "3733716992.0;tcp://192.168.12.9:42503" -mca mca_base_param_file_prefix ../ft-radic -mca mca_base_param_file_path /home/hmeyer/desarrollo/radic-ompi/binarios/share/openmpi/amca-param-sets:/home/hmeyer/desarrollo/Pruebas/codes/coll -mca mca_base_param_file_path_force /home/hmeyer/desarrollo/Pruebas/codes/coll -mca plm rsh [clus9:27962] [[56972,0],0]-[[56972,0],3] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) [clus9:27962] errmgr:hnp:update_state() [[56972,0],0]) ------- Daemon state updated for process [[56972,0],3] [clus9:27962] [[56972,0],0] errmgr:hnp: job [56972,0] reported state COMMUNICATION FAILURE for proc [[56972,0],3] state COMMUNICATION FAILURE pid 0 exit_code 1 [clus9:27962] [[56972,0],0] Daemons terminating - recording daemon [[56972,0],3] as gone [clus9:27962] [[56972,0],0] orteds complete - exiting [clus9:27962] errmgr:hnp: close() [clus9:27962] mca: base: close: component hnp closed [clus9:27962] mca: base: close: unloading component hnp