This was fixed with
--mca btl_tcp_if_exclude lo,eth0,vmnet0,vmnet1,vmnet8.
The machines were trying to connect through virtual connections on the machine. Thanks!
From: email@example.com To: firstname.lastname@example.org Date: Fri, 4 Mar 2011 23:43:42 -0600 Subject: [OMPI users] Connection Errors: Socket is not connected (57) but works for a one messages to each place at first. Works on machine order.
Dear Open-mpi users, Currently we are running on 4 imacs 10.5.8 all identical and all on the same network using MPI version 1.4.1. We get an error that we cannot seem to find any help on. Sometimes we get the error Socket Connection (79) [30451,1],1][btl_tcp_endpoint.c:298:mca_btl_tcp_endpoint_send_blocking] send() failed: Socket is not connected (57) The strangest thing is the error only happens when we run with certain machines in a certain order.
mpicc -m64 -lpthread -w -lm -std="c99" inc/*.h lib/*.c -o dispatcher The strange issues all dispatchers are able to send a one small message to each other before this error occurs. Does not work: mpirun -H juhu,hama -n 2 dispatcher mpirun -H hama,juhu -n 2 dispatcher mpirun -H hama,tuvalu -n 2 dispatcher
mpirun -H juhu,tuvalu -n 2 dispatcher
Works: mpirun -H tuvalu,juhu -n 2 dispatcher
mpirun -H tuvalu,hama -n 2 dispatcher
Dispatcher is a multithreaded application that sends messages to other dispatchers.