On May 10, 2010, at 11:00 AM, Guanyinzhu wrote:
> Did "--mca mpi_preconnect_all 1" work?
> I also face this problem "readv failed: connection time out" in the production environment, and our engineer has reproduced this scenario at 20 nodes with gigabye ethernet and limit one ethernet speed to 2MB/s, then a MPI_Isend && MPI_Recv ring that means each node call MPI_Isend send data to the next node and then call MPI_Recv recv data from the prior with large size for many cycles, then we get the following error log:
> [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection timed out (110)
FWIW, I just had a customer last week have these kinds of issues; in every case, he actually tracked the problem down to hardware issues (e.g., he swapped out ethernet cables and the problems went away).
Keep in mind that Open MPI is simply reporting what the OS tells us. Specifically, Linux has decided to close the socket with a "timed out" error when we tried to read from it.
> I thought it might because the network fd was set nonblocking, and the nonblocking call of connect() might be error and the epoll_wait() was wake up by the error but treat it as success and call mca_btl_tcp_endpoint_recv_handler(), the nonblocking readv() call on a failed connected fd, so it return -1, and set the errorno to 110 which means connection timed out.
Hmm. That's an interesting scenario; do you know that that is happening?
But even if it is -- meaning that we're simply printing out the wrong error message -- the connect() shouldn't fail.
For corporate legal information go to: