Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Robert Cummins (rcummins_at_[hidden])
Date: 2006-08-02 11:11:54


I'm trying to run a 64 way mpi benchmark on my system. I
*always* get the following error and I'm wondering how do
I debug the problem node? I can not reproduce the problem
with a smaller number of nodes.

snip...
[p1d049:18547] [0,1,48]-[0,1,20] mca_oob_tcp_peer_complete_connect:
connect() fa
iled with errno=113
[p1d049:18547] [0,1,48]-[0,1,21] mca_oob_tcp_peer_complete_connect:
connect() fa
iled with errno=113
[p1d049:18547] [0,1,48]-[0,1,24] mca_oob_tcp_peer_complete_connect:
connect() fa
iled with errno=113
[p1d049:18547] [0,1,48]-[0,1,25] mca_oob_tcp_peer_complete_connect:
connect() fa
iled with errno=113
...

It looks like I have well over 128 lines of similar output. A quick
eyeball of
the output seems to indicate about 1/2 of all nodes are reporting this
problem.

I have checked the error counters on my IB switch and I
have 0 new errors during the run.

TIA.

R.