Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-08-03 07:34:08


Greetings Robert.

Can you send all the information listed here:

    http://www.open-mpi.org/community/help/

Of particular interest will be the version that you are using. We had some
bugs with the TCP connection code that were recently fixed. Can you try the
latest 1.1.1 beta tarball and see if it fixes your problem?

    http://www.open-mpi.org/software/ompi/v1.1/

On 8/2/06 11:11 AM, "Robert Cummins" <rcummins_at_[hidden]> wrote:

> I'm trying to run a 64 way mpi benchmark on my system. I
> *always* get the following error and I'm wondering how do
> I debug the problem node? I can not reproduce the problem
> with a smaller number of nodes.
>
> snip...
> [p1d049:18547] [0,1,48]-[0,1,20] mca_oob_tcp_peer_complete_connect:
> connect() fa
> iled with errno=113
> [p1d049:18547] [0,1,48]-[0,1,21] mca_oob_tcp_peer_complete_connect:
> connect() fa
> iled with errno=113
> [p1d049:18547] [0,1,48]-[0,1,24] mca_oob_tcp_peer_complete_connect:
> connect() fa
> iled with errno=113
> [p1d049:18547] [0,1,48]-[0,1,25] mca_oob_tcp_peer_complete_connect:
> connect() fa
> iled with errno=113
> ...
>
> It looks like I have well over 128 lines of similar output. A quick
> eyeball of
> the output seems to indicate about 1/2 of all nodes are reporting this
> problem.
>
> I have checked the error counters on my IB switch and I
> have 0 new errors during the run.
>
> TIA.
>
>
> R.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems