There is no synchronization operation in MPI that promises all tasks will exit at the same time. For MPI_Barrier they will exit as close to the same time as the implementation can reasonably support but as long as the application is distributed and there are delays in the interconnect, it is not possible to provide strict exit synchronization.
If a task involved in the MPI_Barrier happens to be de-scheduled by the OS in the middle of carrying out an MPI_Barrier, the skew can be quite significant (even several milliseconds)
The MPI standard only stipulates that no task in the group may exit MPI_Barrier until all tasks have entered.
As covered in an extensive discussion a couple week back, very few applications actually require MPI_Barrier synchronization at all. Applications in which tasks are affected by outside events or use non-MPI communications can require MPI_Barrier. Tasks that use MPI_ANY_SOURCE or MPI_ANY_TAG receive can act in unexpected ways without judicious use of MPI_Barrier.
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
firstname.lastname@example.org wrote on 03/23/2009 05:11:05 PM:
> [image removed]
> Re: [OMPI users] Collective operations and synchronization
> Ralph Castain
> Open MPI Users
> 03/23/2009 05:12 PM
> Sent by:
> Please respond to Open MPI Users
> Just one point to emphasize - Eugene said it, but many times people
> don't fully grasp the implication.
> On an MPI_Allreduce, the algorithm requires that all processes -enter-
> the call before anyone can exit.
> It does -not- require that they all exit at the same time.
> So if you want to synchronize on the -exit-, as your question
> indicated, then you must add the MPI_Barrier as you describe.
> On Mar 23, 2009, at 3:01 PM, Eugene Loh wrote:
> > Shaun Jackman wrote:
> >> I've just read in the Open MPI documentation 
> > That's the MPI spec, actually.
> >> that collective operations, such as MPI_Allreduce, may synchronize,
> >> but do not necessarily synchronize. My algorithm requires a
> >> collective operation and synchronization; is there a better (more
> >> efficient?) method than simply calling MPI_Allreduce followed by
> >> MPI_Barrier?
> > MPI_Allreduce is a case that actually "requires" synchronization in
> > that no participating process may exit before all processes have
> > entered. So, there should be no need to add additional
> > synchronization. A special case might be an MPI_Allreduce of a 0-
> > length message, in which case I suppose an MPI implementation could
> > simple "do nothing", and the synchronization side-effect would be
> > lost.
> > The MPI spec is mainly talking about a "typical" collective where
> > one could imagine a process exiting before some processes have
> > entered. E.g., in a broadcast or scatter, the root could exit
> > before any other process has entered the operation. In a reduce or
> > gather, the root could enter after all other processes have exited.
> > For all-to-all, allreduce, or allgather, however, no process can
> > exit before all processes have entered, which is the synchronization
> > condition effected by a barrier. (Again, null message lengths can
> > change things.)
> >>  http://www.mpi-forum.org/docs/mpi21-report-bw/node85.htm
> >> _______________________________________________
> >> users mailing list
> >> email@example.com
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > _______________________________________________
> > users mailing list
> > firstname.lastname@example.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> users mailing list