Don, Galen, and I talked about this in depth on the phone today and
think that it is a symptom of the same issue discussed in this thread:
Note my message in that thread from just a few minutes ago:
We think that the proposed solution to that thread will also fix the
mpi_preconnect_all issues (i.e., the ping-pong that Don proposes in
his mail should not be necessary).
On Oct 17, 2007, at 10:54 AM, Don Kerr wrote:
> I have noticed an issue in the 1.2 branch when mpi_preconnect_all=1.
> one way communication pattern (ranks either send or receive from each
> other) may not fully establish connection with peers. Example, if I
> a 3 process mpi job and rank 0 does not do any mpi communication after
> MPI_Init() the other ranks attempts to connect will not be
> progressed (I
> have seen this with tcp and udapl).
> The preconnect pattern has changed slightly in the trunk but
> it is still a one way communication, either send or receive with each
> rank. So although the issue I see in the 1.2 branch does not appear in
> the trunk I wonder if this will show up again.
> An alternative to the preconnect pattern that comes to mind would be
> perform a send and receive between all ranks to ensure that
> have been fully established.
> Does anyone have thoughts or comments on this, or reasons not to have
> all ranks send and receive from all?
> devel mailing list