> Re: MPI_Ssend(). This indeed fixes bug3, the process at rank 0 has
> reasonable memory usage and the execution proceeds normally.
> Re scalable: One second. I know well bug3 is not scalable, and when to
> use MPI_Isend. The point is programmers want to count on the MPI spec as
> written, as Richard pointed out. We want to send small messages quickly
> and efficiently, without the danger of overloading the receiver's
> resources. We can use MPI_Ssend() but it is slow compared MPI_Send().
Your last statement is not necessarily true. By synchronizing processes
using MPI_Ssend(), you can potentially avoid large numbers of unexpected
messages that need to be buffered and copied, and that also need to be
searched every time a receive is posted. There is no guarantee that the
protocol overhead on each message incurred with MPI_Ssend() slows down an
application more than the buffering, copying, and searching overhead of a
large number of unexpected messages.
It is true that MPI_Ssend() is slower than MPI_Send() for ping-pong
micro-benchmarks, but the length of the unexpected message queue doesn't
have to get very long before they are about the same.
> Since identifying this behavior we have implemented the desired flow
> control in our application.
It would be interesting to see performance results comparing doing flow
control in the application versus having MPI do it for you....