Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Reduce performance
From: Alex A. Granovsky (gran_at_[hidden])
Date: 2010-09-09 18:37:44


you did not take into account the dispersion/dephasing between different processes. As cluster size and the
number of instances of parallel process increase, the dispersion increases as well, making different instances
to be a kind out of sync - not really out of sync, but just because of different speed of execution on different nodes, delays, etc...
If you account for this, you get the result I mentioned.


  ----- Original Message -----
  From: Eugene Loh
  To: Open MPI Users
  Sent: Thursday, September 09, 2010 11:32 PM
  Subject: Re: [OMPI users] MPI_Reduce performance

  Alex A. Granovsky wrote:
    Isn't in evident from the theory of random processes and probability theory that in the limit of infinitely
    large cluster and parallel process, the probability of deadlocks with current implementation is unfortunately
    quite a finite quantity and in limit approaches to unity regardless on any particular details of the program.
  No, not at all. Consider simulating a physical volume. Each process is assigned to some small subvolume. It updates conditions locally, but on the surface of its simulation subvolume it needs information from "nearby" processes. It cannot proceed along the surface until it has that neighboring information. Its neighbors, in turn, cannot proceed until their neighbors have reached some point. Two distant processes can be quite out of step with one another, but only by some bounded amount. At some point, a leading process has to wait for information from a laggard to propagate to it. All processes proceed together, in some loose lock-step fashion. Many applications behave in this fashion. Actually, in many applications, the synchronization is tightened in that "physics" is made to propagate faster than neighbor-by-neighbor.

  As the number of processes increases, the laggard might seem relatively slower in comparison, but that isn't deadlock.

  As the size of the cluster increases, the chances of a system component failure increase, but that also is a different matter.


  users mailing list