Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI Connection Retry Policy
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-09-08 06:29:51


Charles,

The listen is always posted on each MPI processes. This will fire when
a remote is connecting, so we will setup the connection even if you
didn't yet posted the receive.

So, yes the first MPI_Send to each peer will always call connect(). On
the remote process, the accept is called automatically, no need for
MPI_Recv for this.

For your second question there is no such answer, as what you describe
is not what we do. If the connect fails for any reasons, we will try
few times to establish the connection before giving up. However, if
this happens (and usually it doesn't) it will only affect the first
MPI_Send to each peer.

I think you're right about the FAQ ;)

   george.

On Sep 8, 2009, at 13:11 , Charles Salvia wrote:

> According to the OpenMPI FAQ, OpenMPI creates point-to-point socket
> connections "lazily", i.e. only when needed.
>
> I have a few questions about this, and how it affect program
> performance.
>
> 1) Does this mean that MPI_Send will call connect() if necessary,
> and MPI_Recv will call accept()?
>
> 2) If so, what is the policy for dealing with the race condition
> where one process calls connect() before the destination process is
> listening with accept()? Is there a retry interval? And if so, how
> long is the interval and how many times will it retry? I ask
> because I want to know how much of a performance impact this has.
>
> 3) I'm confused as to something the FAQ says regarding this issue.
> The OpenMPI FAQ says "Open MPI opens sockets as they are required --
> so the first time a process sends a message to a peer and there is a
> TCP connection between the two, Open MPI will automatically open a
> new socket." Shouldn't this read "so the first time a process sends
> a message to a peer and there is *NO* TCP connection between the
> two"? Or am I misunderstanding something here?
>
> I appreciate any feed back regarding this issue.
> Thanks,
>
> Charles Salvia
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users