Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about collective messages implementation
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-11-02 09:50:09

On Nov 2, 2010, at 6:21 AM, Jerome Reybert wrote:

> Each host_comm communicator is grouping tasks by machines. I ran this version,
> but performances are worst than the current version (each task performing its
> own Lapack function). I have several questions:

> - in my implementation, is MPI_Bcast aware that it should use shared memory
> memory communication? Is data go through the network? It seems it is the case,
> considering the first results.

It should use shared memory by default.

> - is there any other methods to group task by machine, OpenMPI being aware
> that it is grouping task by shared memory?

The MPI API does not expose this kind of functionality, but there's at least 1 proposal in front of the MPI Forum to do this kind of thing.

As Ashley mentioned, you might want to do this MPI_Comm_split once and then just use that communicator from then on. The code snipit you sent leaks the host_comm, for example.

> - is it possible to assign a policy (in this case, a shared memory policy) to
> a Bcast or a Barrier call?

Not really, no.

> - do you have any better idea for this problem? :)

Ashley probably hit the nail on the head. The short version is that OMPI aggressively polls for progress. Forcing the degraded mode will help (because it'll yield), but it won't solve the problem because it'll still be aggressively polling -- it'll just yield every time it polls. But it's still polling.

We've had many discussions about this topic, but have never really addressed it -- the need for low latency has been greater than the need for blocking/not-consuming-CPU progress.

Jeff Squyres
For corporate legal information go to: