Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-07 10:57:40


Mostyn --

Is the 10.173.128.48/49 address visible/reachable by all nodes in the
job (to include the node where mpirun is executing)? This seems to
be the problematic network.

If it is not, you might want to just disable that interface with the
oob_tcp_if_include and btl_tcp_if_include MCA parameters, for example:

   mpirun --mca oob_tcp_if_include eth0,ib1 --mca btl_tcp_if_include
eth0,ib1 ...

(IIRC, we had a mismatch in the MCA param name forms before 1.2.4 --
so if you have any older version, you might want to check "ompi_info
--param btl tcp" and "ompi_info --param oob tcp" to ensure you have
the right param names)

On Oct 2, 2007, at 2:09 AM, Mostyn Lewis wrote:

> More information. Sorry about the length of this.
> I switched on -mca oob_tcp_debug 1000 and the result is below.
> Later on there's an "ifconfig -a" as the trace seems to show you
> are trying
> connections to all 3 interfaces in oob - 5.* is InfiniBand IPoIB -
> 7.* is a
> private ethernet with no connection (cable) - 10.* is the general
> ethernet
> which I thought I was using, only.
> At the end there's a ompi_info.
>
> Is this expected behavio(u)r?
>
> Regards,
> Mostyn
>
> Script started on Mon 01 Oct 2007 04:34:35 PM PDT
>
> mostyn_at_s0120:/ctmp8/mostyn/glamex/pi> $OPENMPI_GCC/bin/mpirun -mca
> oob_tcp_debug 1000 -np 4 -machinefile j ./a.out
> [s0120:13160] [0,0] accepting connections via event library
> [s0120:13160] [0,0] mca_oob_tcp_accept: 5.6.128.49:59075
> [s0120:13160] [0,0]-[0,1] accepted: 5.6.128.48 - 5.6.128.49 nodelay
> 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 197
> [s0120:13160] [0,0]-[0,0] mca_oob_tcp_send_nb: tag 20 size 333
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 20 size 333
> [s0120:13160] [0,0]-[0,0] mca_oob_tcp_send_nb: tag 4 size 1441
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 4 size 1441
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 1218
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 117
> [s0121:15383] [1,0]-[0,1] mca_oob_tcp_send_nb: tag 4 size 26
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 154
> [s0121:15383] [1,0]-[0,1] mca_oob_tcp_peer_try_connect: connecting
> port 0 to: 10.173.128.49:45984
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15383] [1,0]-[0,1] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 117
> [s0121:15386] [1,3]-[0,1] mca_oob_tcp_send_nb: tag 4 size 26
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 154
> [s0121:15386] [1,3]-[0,1] mca_oob_tcp_peer_try_connect: connecting
> port 0 to: 10.173.128.49:45984
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15386] [1,3]-[0,1] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15383] [1,0] accepting connections via event library
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 117
> [s0121:15384] [1,1]-[0,1] mca_oob_tcp_send_nb: tag 4 size 26
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 154
> [s0121:15384] [1,1]-[0,1] mca_oob_tcp_peer_try_connect: connecting
> port 0 to: 10.173.128.49:45984
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15384] [1,1]-[0,1] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15386] [1,3] accepting connections via event library
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 117
> [s0121:15385] [1,2]-[0,1] mca_oob_tcp_send_nb: tag 4 size 26
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 154
> [s0121:15385] [1,2]-[0,1] mca_oob_tcp_peer_try_connect: connecting
> port 0 to: 10.173.128.49:45984
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15385] [1,2]-[0,1] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15384] [1,1] accepting connections via event library
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15385] [1,2] accepting connections via event library
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 119
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 1190
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 155
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 34143 to: 7.8.82.120:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 158
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 34143 to: 10.173.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 289
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_peer_try_connect: connect to
> 10.173.128.48:45243 failed: Software caused connection abort (103)
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 34143 to: 5.6.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0] mca_oob_tcp_accept: 5.6.128.49:59081
> [s0120:13160] [0,0]-[1,0] accepted: 5.6.128.48 - 5.6.128.49 nodelay
> 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 170
> [s0121:15383] [1,0]-[0,0] connected: 5.6.128.49 - 5.6.128.48
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 1190
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 119
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 1190
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 155
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 38806 to: 7.8.82.120:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 158
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 38806 to: 10.173.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0] mca_oob_tcp_accept: 5.6.128.49:59083
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 289
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_peer_try_connect: connect to
> 10.173.128.48:45243 failed: Software caused connection abort (103)
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 38806 to: 5.6.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,1] accepted: 5.6.128.48 - 5.6.128.49 nodelay
> 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 170
> [s0121:15384] [1,1]-[0,0] connected: 5.6.128.49 - 5.6.128.48
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 1190
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 119
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 1190
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 155
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 50390 to: 7.8.82.120:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 158
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 50390 to: 10.173.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0] mca_oob_tcp_accept: 5.6.128.49:59085
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 289
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_peer_try_connect: connect to
> 10.173.128.48:45243 failed: Software caused connection abort (103)
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 50390 to: 5.6.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 218
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 1190
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 56284 to: 7.8.82.120:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,2] accepted: 5.6.128.48 - 5.6.128.49 nodelay
> 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 170
> [s0121:15385] [1,2]-[0,0] connected: 5.6.128.49 - 5.6.128.48
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 158
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 56284 to: 10.173.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 1190
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0] mca_oob_tcp_accept: 5.6.128.49:59087
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 289
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_peer_try_connect: connect to
> 10.173.128.48:45243 failed: Software caused connection abort (103)
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_peer_try_connect: connecting
> port 56284 to: 5.6.128.48:45243
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] accepted: 5.6.128.48 - 5.6.128.49 nodelay
> 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 170
> [s0121:15386] [1,3]-[0,0] connected: 5.6.128.49 - 5.6.128.48
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 1190
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 7 size 2130
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 7 size 2130
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 7 size 2130
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 7 size 2130
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 122
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 2130
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 122
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 2130
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 122
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 2130
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 122
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 2130
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 220
> [s0121:15383] [1,0]-[1,1] mca_oob_tcp_send_nb: tag 18 size 28
> [s0121:15383] [1,0]-[1,1] mca_oob_tcp_peer_try_connect: connecting
> port 34143 to: 10.173.128.49:38806
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15384] [1,1] mca_oob_tcp_accept: 10.173.128.49:60961
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15383] [1,0]-[1,1] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 175
> [s0121:15384] [1,1]-[1,0] accepted: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[1,0] mca_oob_tcp_msg_recv_handler: size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 220
> [s0121:15383] [1,0]-[1,2] mca_oob_tcp_send_nb: tag 18 size 28
> [s0121:15383] [1,0]-[1,2] mca_oob_tcp_peer_try_connect: connecting
> port 34143 to: 10.173.128.49:50390
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15385] [1,2] mca_oob_tcp_accept: 10.173.128.49:34601
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15383] [1,0]-[1,2] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 175
> [s0121:15385] [1,2]-[1,0] accepted: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[1,0] mca_oob_tcp_msg_recv_handler: size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 220
> [s0121:15383] [1,0]-[1,3] mca_oob_tcp_send_nb: tag 18 size 28
> [s0121:15383] [1,0]-[1,3] mca_oob_tcp_peer_try_connect: connecting
> port 34143 to: 10.173.128.49:56284
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 116
> [s0121:15386] [1,3] mca_oob_tcp_accept: 10.173.128.49:36463
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 176
> [s0121:15383] [1,0]-[1,3] connected: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 175
> [s0121:15386] [1,3]-[1,0] accepted: 10.173.128.49 - 10.173.128.49
> nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15386] [1,3]-[1,0] mca_oob_tcp_msg_recv_handler: size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 242
> [s0121:15383] [1,0]-[1,1] mca_oob_tcp_send_nb: tag 18 size 28
> [s0121:15383] [1,0]-[1,2] mca_oob_tcp_send_nb: tag 18 size 28
> [s0121:15383] [1,0]-[1,3] mca_oob_tcp_send_nb: tag 18 size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[1,0] mca_oob_tcp_msg_recv_handler: size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[1,0] mca_oob_tcp_msg_recv_handler: size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15386] [1,3]-[1,0] mca_oob_tcp_msg_recv_handler: size 28
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 80
> Process 0 of 4 on s0121
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 80
> Process 1 of 4 on s0121
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 80
> Process 3 of 4 on s0121
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 80
> Process 2 of 4 on s0121
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 96
> 15383:a.out *->2 (f=noaffinity,0,1,2,3)
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 96
> 15384:a.out *->2 (f=noaffinity,0,1,2,3)
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 96
> 15385:a.out *->3 (f=noaffinity,0,1,2,3)
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 96
> 15386:a.out *->2 (f=noaffinity,0,1,2,3)
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 244
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 244
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 308
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 100
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 100
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 7 size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 170
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv_handler: size 105
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 244
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_send_nb: tag 2 size 105
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv_handler: size 105
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 182
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_send_nb: tag 2 size 105
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv_handler: size 105
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 182
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_send_nb: tag 2 size 105
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 118
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_send_nb: tag 2 size 105
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv_handler: size 105
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_send_nb: tag 7 size 183
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_send_nb: tag 7 size 183
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_send_nb: tag 7 size 183
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_send_nb: tag 7 size 183
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 120
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_msg_recv: peer closed connection
> [s0120:13160] [0,0]-[1,3] mca_oob_tcp_peer_close(0x52cb20) sd 11
> state 4
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_msg_recv: peer closed connection
> [s0120:13160] [0,0]-[1,1] mca_oob_tcp_peer_close(0x52c5e0) sd 9
> state 4
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_msg_recv: peer closed connection
> [s0120:13160] [0,0]-[1,2] mca_oob_tcp_peer_close(0x52c880) sd 10
> state 4
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15384] [1,1]-[0,0] mca_oob_tcp_msg_recv_handler: size 183
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15385] [1,2]-[0,0] mca_oob_tcp_msg_recv_handler: size 183
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15386] [1,3]-[0,0] mca_oob_tcp_msg_recv_handler: size 183
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 121
> [s0121:15383] [1,0]-[0,0] mca_oob_tcp_msg_recv_handler: size 183
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 127
> [s0121:15383] [1,0]-[1,3] mca_oob_tcp_msg_recv: peer closed connection
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 106
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 129
> [s0121:15383] [1,0]-[1,3] mca_oob_tcp_peer_close(0x51d500) sd 14
> state 4
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 200
> [s0121:15383] [1,0]-[1,2] mca_oob_tcp_msg_recv: peer closed connection
> [s0121:15383] [1,0]-[1,2] mca_oob_tcp_peer_close(0x51d260) sd 13
> state 4
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 200
> [s0121:15383] [1,0]-[1,1] mca_oob_tcp_msg_recv: peer closed connection
> [s0121:15383] [1,0]-[1,1] mca_oob_tcp_peer_close(0x51cfc0) sd 12
> state 4
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 5 size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 106
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_msg_recv: peer closed connection
> [s0120:13160] [0,0]-[1,0] mca_oob_tcp_peer_close(0x52c340) sd 8
> state 4
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 106
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 56
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 140
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_msg_recv_handler: size 106
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 2 size 35
> [s0120:13160] [0,0]-[0,0] mca_oob_tcp_send_nb: tag 4 size 26
> [s0120:13160] [0,0]-[0,1] mca_oob_tcp_send_nb: tag 4 size 26
> mostyn_at_s0120:/ctmp8/mostyn/glamex/pi> ifconfig -a
> eth0 Link encap:Ethernet HWaddr 00:1B:24:3E:13:63
> inet addr:7.8.82.120 Bcast:7.8.255.255 Mask:255.255.0.0
> UP BROADCAST MULTICAST MTU:1500 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> Interrupt:233 Base address:0xe000
>
> eth1 Link encap:Ethernet HWaddr 00:1B:24:3E:13:64
> inet addr:10.173.128.48 Bcast:10.173.255.255 Mask:
> 255.255.0.0
> inet6 addr: fe80::21b:24ff:fe3e:1364/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:17976983 errors:0 dropped:0 overruns:0 frame:0
> TX packets:4204911 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:10709547521 (10213.4 Mb) TX bytes:586325842
> (559.1 Mb)
> Interrupt:50
>
> ib1 Link encap:Ethernet HWaddr 02:00:00:00:00:01
> inet addr:5.6.128.48 Bcast:5.6.255.255 Mask:255.255.0.0
> inet6 addr: fe80::ff:fe00:1/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
> RX packets:8153 errors:0 dropped:0 overruns:0 frame:0
> TX packets:4116 errors:0 dropped:7 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:1638837 (1.5 Mb) TX bytes:677364 (661.4 Kb)
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:6081 errors:0 dropped:0 overruns:0 frame:0
> TX packets:6081 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:6011701 (5.7 Mb) TX bytes:6011701 (5.7 Mb)
>
> sit0 Link encap:IPv6-in-IPv4
> NOARP MTU:1480 Metric:1
> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> mostyn_at_s0120:/ctmp8/mostyn/glamex/pi> $OPENMPI_GCC/bin/ompi_info
> Open MPI: 1.3a1svn09302007
> Open MPI SVN revision: svn09302007
> Open RTE: 1.3a1svn09302007
> Open RTE SVN revision: svn09302007
> OPAL: 1.3a1svn09302007
> OPAL SVN revision: svn09302007
> Prefix: /tools/openmpi/1.3a1r16272_svn/ethernet/
> gcc64/4.1.0/tcp/suse_sles_10/x86_64/opteron
> Configured architecture: x86_64-unknown-linux-gnu
> Configure host: s0191
> Configured by: root
> Configured on: Sun Sep 30 15:11:05 PDT 2007
> Configure host: s0191
> Built by: mostyn
> Built on: Sun Sep 30 15:20:43 PDT 2007
> Built host: s0191
> C bindings: yes
> C++ bindings: yes
> Fortran77 bindings: yes (all)
> Fortran90 bindings: yes
> Fortran90 bindings size: small
> C compiler: gcc
> C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
> Fortran77 compiler: gfortran
> Fortran77 compiler abs: /usr/bin/gfortran
> Fortran90 compiler: gfortran
> Fortran90 compiler abs: /usr/bin/gfortran
> C profiling: yes
> C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: yes
> C++ exceptions: no
> Thread support: posix (mpi: no, progress: no)
> Sparse Groups: no
> Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: no
> MPI I/O support: yes
> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
> v1.3)
> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.3)
> MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.3)
> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component
> v1.3)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
> MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
> MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.3)
> MCA coll: inter (MCA v1.0, API v1.0, Component v1.3)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.3)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.3)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.3)
> MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
> MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
> MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)
> MCA pml: dr (MCA v1.0, API v1.0, Component v1.3)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.3)
> MCA bml: r2 (MCA v1.0, API v1.0, Component v1.3)
> MCA rcache: vma (MCA v1.0, API v1.0, Component v1.3)
> MCA btl: self (MCA v1.0, API v1.0.1, Component v1.3)
> MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.3)
> MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.3)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.3)
> MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.3)
> MCA osc: rdma (MCA v1.0, API v1.0, Component v1.3)
> MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.3)
> MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.3)
> MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.3)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.3)
> MCA gpr: replica (MCA v1.0, API v1.0, Component
> v1.3)
> MCA grpcomm: basic (MCA v1.0, API v2.0, Component v1.3)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.3)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.3)
> MCA ns: proxy (MCA v1.0, API v2.0, Component v1.3)
> MCA ns: replica (MCA v1.0, API v2.0, Component
> v1.3)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA odls: default (MCA v1.0, API v1.3, Component
> v1.3)
> MCA ras: dash_host (MCA v1.0, API v1.3, Component
> v1.3)
> MCA ras: gridengine (MCA v1.0, API v1.3,
> Component v1.3)
> MCA ras: localhost (MCA v1.0, API v1.3, Component
> v1.3)
> MCA ras: slurm (MCA v1.0, API v1.3, Component v1.3)
> MCA rds: hostfile (MCA v1.0, API v1.3, Component
> v1.3)
> MCA rds: proxy (MCA v1.0, API v1.3, Component v1.3)
> MCA rmaps: round_robin (MCA v1.0, API v1.3,
> Component v1.3)
> MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.3)
> MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.3)
> MCA rml: oob (MCA v1.0, API v1.0, Component v1.3)
> MCA routed: unity (MCA v1.0, API v1.0, Component v1.3)
> MCA pls: gridengine (MCA v1.0, API v1.3,
> Component v1.3)
> MCA pls: proxy (MCA v1.0, API v1.3, Component v1.3)
> MCA pls: rsh (MCA v1.0, API v1.3, Component v1.3)
> MCA pls: slurm (MCA v1.0, API v1.3, Component v1.3)
> MCA sds: env (MCA v1.0, API v1.0, Component v1.3)
> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.3)
> MCA sds: seed (MCA v1.0, API v1.0, Component v1.3)
> MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.3)
> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.3)
> MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)
> mostyn_at_s0120:/ctmp8/mostyn/glamex/pi> exit
>
> Script done on Mon 01 Oct 2007 04:35:03 PM PDT
>
>
> On Sun, 30 Sep 2007, Mostyn Lewis wrote:
>
>> Any ideas about this. One dual core operton box talking to another
>> using
>> infincon/silverstorm/qlogic hardware and mvapi (actually it's the
>> same
>> just using ethernet and tcp):
>>
>> $OPENMPI_INFINICON_GCC_MVAPI/bin/mpicc cpi.c
>> $OPENMPI_INFINICON_GCC_MVAPI/bin/-np 4 -machinefile j ./a.out
>> [s0121:07450] [1,0]-[0,0] mca_oob_tcp_peer_try_connect: connect to
>> 10.173.128.48:43359 failed: Software caused connection abort (103)
>> [s0121:07451] [1,1]-[0,0] mca_oob_tcp_peer_try_connect: connect to
>> 10.173.128.48:43359 failed: Software caused connection abort (103)
>> [s0121:07453] [1,3]-[0,0] mca_oob_tcp_peer_try_connect: connect to
>> 10.173.128.48:43359 failed: Software caused connection abort (103)
>> [s0121:07452] [1,2]-[0,0] mca_oob_tcp_peer_try_connect: connect to
>> 10.173.128.48:43359 failed: Software caused connection abort (103)
>> Process 2 of 4 on s0121
>> Process 0 of 4 on s0121
>> Process 1 of 4 on s0121
>> Process 3 of 4 on s0121
>> 7451:a.out *->3 (f=noaffinity,0,1,2,3)
>> 7453:a.out *->2 (f=noaffinity,0,1,2,3)
>> 7450:a.out *->3 (f=noaffinity,0,1,2,3)
>> 7452:a.out *->3 (f=noaffinity,0,1,2,3)
>>
>> The Process msgs and the affinity stuff means it ran. The oob msgs
>> are somewhat annoying
>> (imagine hundreds of nodes). The 10.173.128.48 address is the
>> launch node (s0120).
>> This is SuSE SLES10:
>> s0120 Sun Sep 30 16:15:02 PDT 2007
>> SUSE Linux Enterprise Server 10 (x86_64)
>> Linux version 2.6.16.21-0.8-smp.lustre-1.6.1.X2200.MRL-0.8-smp
>> (geeko_at_buildhost) (gcc version 4.1.0 (SUSE Linux)) #1 SMP Tue Aug
>> 28 09:51:26 PDT 2007
>> Machine Model Sun Fire X2200 M2
>> Bus Speed 202 MHz
>> 4 Cpus
>> CPU0 Dual-Core AMD Opteron(tm) Processor 2220(2814.485Mhz) stepping 3
>> L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> L2 cache: 1024 KB
>> CPU1 Dual-Core AMD Opteron(tm) Processor 2220(2814.485Mhz) stepping 3
>> L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> L2 cache: 1024 KB
>> CPU2 Dual-Core AMD Opteron(tm) Processor 2220(2814.485Mhz) stepping 3
>> L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> L2 cache: 1024 KB
>> CPU3 Dual-Core AMD Opteron(tm) Processor 2220(2814.485Mhz) stepping 3
>> L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> L2 cache: 1024 KB
>> 16.0 GB memory
>>
>> Regards,
>> Mostyn
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems