Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jonathan Underwood (jonathan.underwood_at_[hidden])
Date: 2007-06-11 19:05:37


Hi Adrian,

On 11/06/07, Adrian Knoth <adi_at_[hidden]> wrote:
> Which OMPI version?
>

1.2.2

> > $ perl -e 'die$!=110'
> > Connection timed out at -e line 1.
>
> Looks pretty much like a routing issue. Can you sniff on eth1 on the
> frontend node?
>

I don't have root access, so am afraid not.

> > This error message occurs the first time one of the compute nodes,
> > which are on a private network, attempts to send data to the frontend
>
> > In actual fact, it seems that the error occurs the first time a
> > process on the frontend tries to send data to another process on the
> > frontend.
>
> What's the exact problem? compute-node -> frontend? I don't think you
> have two processes on the frontend node, and even if you do, they should
> use shared memory.
>
> > Any advice would be very welcome
>
> Use tcpdump and/or recompile with debug enabled. In addition, set
> WANT_PEER_DUMP in ompi/mca/btl/tcp/btl_tcp_endpoint.c to 1 (line 120)
> and recompile, thus giving you more debug output.
>
> Depending on your OMPI version, you can also add
>
> mpi_preconnect_all=1
>
> to your ~/.openmpi/mca-params.conf, by this establishing all connections
> during MPI_Init().
>

OK, will try these things.

> If nothing helps, exclude the frontend from computation.
>
>

OK.

Thanks for the suggestions!

Joanthan