Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Connection Errors: Socket is not connected (57) but works for a one messages to each place at first. Works on machine order.
From: atexannamedbob_at_[hidden]
Date: 2011-03-05 00:43:42


Dear Open-mpi users,
Currently we are running on 4 imacs 10.5.8 all identical and all on the same network using MPI version 1.4.1.
We get an error that we cannot seem to find any help on.
Sometimes we get the error Socket Connection (79)
[30451,1],1][btl_tcp_endpoint.c:298:mca_btl_tcp_endpoint_send_blocking] send() failed: Socket is not connected (57)
The strangest thing is the error only happens when we run with certain machines in a certain order.

ECHO $Path /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/texbin

mpicc -m64 -lpthread -w -lm -std="c99" inc/*.h lib/*.c -o dispatcher

The strange issues all dispatchers are able to send a one small message to each other before this error occurs.
Does not work:
mpirun -H juhu,hama -n 2 dispatcher
mpirun -H hama,juhu -n 2 dispatcher
mpirun -H hama,tuvalu -n 2 dispatchermpirun -H juhu,tuvalu -n 2 dispatcherWorks:
mpirun -H tuvalu,juhu -n 2 dispatchermpirun -H tuvalu,hama -n 2 dispatcher
Dispatcher is a multithreaded application that sends messages to other dispatchers.

ifconfig output for machine 1 with the problem

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
    inet 127.0.0.1 netmask 0xff000000
    inet6 ::1 prefixlen 128
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
    lladdr 00:1f:f3:ff:fe:6e:5d:26
    media: autoselect <full-duplex> status: inactive
    supported media: autoselect <full-duplex>
en1: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
    ether 00:1f:5b:c9:3b:8f
    media: autoselect (<unknown type>) status: inactive
    supported media: autoselect
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet 131.179.224.186 netmask 0xffffff00 broadcast 131.179.224.255
    ether 00:1f:f3:59:d2:3d
    media: autoselect (100baseTX <full-duplex>) status: active
    supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> none
vmnet8: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet 172.16.181.1 netmask 0xffffff00 broadcast 172.16.181.255
    ether 00:50:56:c0:00:08
vmnet1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet 172.16.32.1 netmask 0xffffff00 broadcast 172.16.32.255
    ether 00:50:56:c0:00:01

ifconfig output for machine 2 with the problem

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
    inet 127.0.0.1 netmask 0xff000000
    inet6 ::1 prefixlen 128
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
    lladdr 00:1f:5b:ff:fe:20:ae:1e
    media: autoselect <full-duplex> status: inactive
    supported media: autoselect <full-duplex>
en1: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
    ether 00:1f:5b:c9:10:1d
    media: autoselect (<unknown type>) status: inactive
    supported media: autoselect
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet6 fe80::21e:c2ff:fe1a:c673%en0 prefixlen 64 scopeid 0x6
    inet 131.179.224.185 netmask 0xffffff00 broadcast 131.179.224.255
    ether 00:1e:c2:1a:c6:73
    media: autoselect (100baseTX <full-duplex>) status: active
    supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> none
vboxnet0: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    ether 0a:00:27:00:00:00
vmnet1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet 192.168.138.1 netmask 0xffffff00 broadcast 192.168.138.255
    ether 00:50:56:c0:00:01
vmnet8: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
    inet 192.168.56.1 netmask 0xffffff00 broadcast 192.168.56.255
    ether 00:50:56:c0:00:08

Thanks!
Oren