Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] failure with zero-length Reduce()andbothsbuf=rbuf=NULL
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-02-10 12:19:46


On Feb 10, 2010, at 11:59 AM, Lisandro Dalcin wrote:

> > If I remember correctly, the HPCC pingpong test synchronizes occasionally by
> > having one process send a zero-byte broadcast to all other processes.
> > What's a zero-byte broadcast? Well, some MPIs apparently send no data, but
> > do have synchronization semantics. (No non-root process can exit before the
> > root process has entered.) Other MPIs treat the zero-byte broadcasts as
> > no-ops; there is no synchronization and then timing results from the HPCC
> > pingpong test are very misleading. So far as I can tell, the MPI standard
> > doesn't address which behavior is correct.
>
> Yep... for p2p communication things are more clear (and behavior more
> consistens in the MPI's out there) regarding zero-length messages...
> IMHO, collectives should be non-op only in the sense that no actual
> reduction is made because there are no elements to operate on. I mean,
> if Reduce(count=1) implies a sync, Reduce(count=0) should also imply a
> sync...

Sorry to disagree again. :-)

The *only* MPI collective operation that guarantees a synchronization is barrier. The lack of synchronization guarantee for all other collective operations is very explicit in the MPI spec. Hence, it is perfectly valid for an MPI implementation to do something like a no-op when no data transfer actually needs to take place (except, of course, the fact that Reduce(count=1) isn't defined ;-) ).

> > The test strikes me as
> > deficient: it would have been just as easy to have a single-word broadcast
> > to implement the synchronization they were looking for.
>
> Or use MPI_Barrier() ...

This one I agree with. ;-)

There's still jitter time on when individual processes *leave* a barrier, but MPI's do actually strive to reduce that jitter when possible. It's definitely a higher synchronization level than a short broadcast (but then again, you probably could emulate a barrier with short broadcasts if you really want to ;-) ).

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/