Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI hangs on multiple nodes
From: Gus Correa (gus_at_[hidden])
Date: 2011-09-20 11:07:45


Ole Nielsen wrote:
> Thanks for your suggestion Gus, we need a way of debugging what is going
> on. I am pretty sure the problem lies with our cluster configuration. I
> know MPI simply relies on the underlying network. However, we can ping
> and ssh to all nodes (and in between and pair as well) so it is
> currently a mystery why MPI doesn't communicate across nodes on our cluster.
> Two further questions for the group
>
> 1. I would love to run the test program connectivity.c, but cannot
> find it anywhere. Can anyone help please?

If you downloaded the OpenMPI tarball, it is in examples/connectivity.c
wherever you untarred it [now where you installed].

> 2. After having left the job hanging over night we got the message
> [node5][[9454,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection timed out (110).
> Does anyone know what this means?
>
>
> Cheers and thanks
> Ole
> PS - I don't see how separate buffers would help. Recall that the test
> program I use works fine on other installations and indeed when run on
> one the cores of one Node.
>

It probably won't help, as Eugene explained.
Your program works here, worked also for Davendra Rai.
If you were using MPI_ISend [non-blocking],
then you would need separate buffers.

For large amounts of data and many processes,
I would rather use non-blocking communication [and separate
buffers], specially if you do work between send and recv.
But that's not what hangs your program.

Gus Correa

>
>
>
> Message: 11
> Date: Mon, 19 Sep 2011 10:37:02 -0400
> From: Gus Correa <gus_at_[hidden] <mailto:gus_at_[hidden]>>
> Subject: Re: [OMPI users] RE : MPI hangs on multiple nodes
> To: Open MPI Users <users_at_[hidden] <mailto:users_at_[hidden]>>
> Message-ID: <4E77538E.3070007_at_[hidden]
> <mailto:4E77538E.3070007_at_[hidden]>>
> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>
> Hi Ole
>
> You could try the examples/connectivity.c program in the
> OpenMPI source tree, to test if everything is alright.
> It also hints how to solve the buffer re-use issue
> that Sebastien [rightfully] pointed out [i.e., declare separate
> buffers for MPI_Send and MPI_Recv].
>
> Gus Correa
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users