Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Connection Errors: Socket is not connected (57) but works for a one messages to each place at first. Works on machine order.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-03-17 11:07:52


Sorry for the delayed reply.

Is there any chance you can upgrade to the latest version of Open MPI?

Also, I'm not an IPv6 expert -- could you try disabling IPv6? (I can't tell offhand from your output whether it's enabled or disabled)

I say this because we *did* have a whacko problem on OS X regarding IPv6 (see http://blogs.cisco.com/performance/why_mpi_is_good_for_you/ and the linked Open MPI commit message for some details, if you care). This fix was included in Open MPI 1.4.2 and the entire 1.5.x series. If you can upgrade to 1.4.2, you may not need to change your IPv6 settings.

On Mar 5, 2011, at 12:43 AM, <atexannamedbob_at_[hidden]> <atexannamedbob_at_[hidden]> wrote:

> Dear Open-mpi users,
> Currently we are running on 4 imacs 10.5.8 all identical and all on the same network using MPI version 1.4.1.
> We get an error that we cannot seem to find any help on.
> Sometimes we get the error Socket Connection (79)
> [30451,1],1][btl_tcp_endpoint.c:298:mca_btl_tcp_endpoint_send_blocking] send() failed: Socket is not connected (57)
> The strangest thing is the error only happens when we run with certain machines in a certain order.
>
>
> ECHO $Path /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/texbin
>
> mpicc -m64 -lpthread -w -lm -std="c99" inc/*.h lib/*.c -o dispatcher
>
> The strange issues all dispatchers are able to send a one small message to each other before this error occurs.
> Does not work:
> mpirun -H juhu,hama -n 2 dispatcher
> mpirun -H hama,juhu -n 2 dispatcher
> mpirun -H hama,tuvalu -n 2 dispatcher
> mpirun -H juhu,tuvalu -n 2 dispatcher
> Works:
> mpirun -H tuvalu,juhu -n 2 dispatcher
> mpirun -H tuvalu,hama -n 2 dispatcher
>
> Dispatcher is a multithreaded application that sends messages to other dispatchers.
>
>
> ifconfig output for machine 1 with the problem
>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
> inet 127.0.0.1 netmask 0xff000000
> inet6 ::1 prefixlen 128
> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
> stf0: flags=0<> mtu 1280
> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
> lladdr 00:1f:f3:ff:fe:6e:5d:26
> media: autoselect <full-duplex> status: inactive
> supported media: autoselect <full-duplex>
> en1: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
> ether 00:1f:5b:c9:3b:8f
> media: autoselect (<unknown type>) status: inactive
> supported media: autoselect
> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet 131.179.224.186 netmask 0xffffff00 broadcast 131.179.224.255
> ether 00:1f:f3:59:d2:3d
> media: autoselect (100baseTX <full-duplex>) status: active
> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> none
> vmnet8: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet 172.16.181.1 netmask 0xffffff00 broadcast 172.16.181.255
> ether 00:50:56:c0:00:08
> vmnet1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet 172.16.32.1 netmask 0xffffff00 broadcast 172.16.32.255
> ether 00:50:56:c0:00:01
>
> ifconfig output for machine 2 with the problem
>
>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
> inet 127.0.0.1 netmask 0xff000000
> inet6 ::1 prefixlen 128
> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
> stf0: flags=0<> mtu 1280
> fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
> lladdr 00:1f:5b:ff:fe:20:ae:1e
> media: autoselect <full-duplex> status: inactive
> supported media: autoselect <full-duplex>
> en1: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
> ether 00:1f:5b:c9:10:1d
> media: autoselect (<unknown type>) status: inactive
> supported media: autoselect
> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet6 fe80::21e:c2ff:fe1a:c673%en0 prefixlen 64 scopeid 0x6
> inet 131.179.224.185 netmask 0xffffff00 broadcast 131.179.224.255
> ether 00:1e:c2:1a:c6:73
> media: autoselect (100baseTX <full-duplex>) status: active
> supported media: autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 10baseT/UTP <full-duplex,flow-control> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback> 100baseTX <full-duplex,flow-control> 1000baseT <full-duplex> 1000baseT <full-duplex,hw-loopback> 1000baseT <full-duplex,flow-control> none
> vboxnet0: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> ether 0a:00:27:00:00:00
> vmnet1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet 192.168.138.1 netmask 0xffffff00 broadcast 192.168.138.255
> ether 00:50:56:c0:00:01
> vmnet8: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> inet 192.168.56.1 netmask 0xffffff00 broadcast 192.168.56.255
> ether 00:50:56:c0:00:08
>
>
> Thanks!
> Oren
> <ompi_info.txt>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/