Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] is there an equiv of iprove for bcast?
From: Randolph Pullen (randolph_pullen_at_[hidden])
Date: 2011-05-10 22:14:13

The messages are small and frequent (they flash metadata across the cluster).  The current approach works fine for small to medium clusters but I want it to be able to go big.  Maybe up to several hundred or even a thousands of nodes.

Its these larger deployments that concern me.  The current scheme may see the clearinghouse become overloaded in a very large cluster.
>From what you have  said, a possible strategy may be to combine the listener and worker into a single process, using the non-blocking bcast just for that group, while each worker scanned its own port for an incoming request, which it would in turn bcast to its peers.
As you have indicated though, this would depend on the load the non-blocking bcast would cause.  - At least the load would be fairly even over the cluster.

--- On Mon, 9/5/11, Jeff Squyres <jsquyres_at_[hidden]> wrote:

From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] is there an equiv of iprove for bcast?
To: randolph_pullen_at_[hidden]
Cc: "Open MPI Users" <users_at_[hidden]>
Received: Monday, 9 May, 2011, 11:27 PM

On May 3, 2011, at 8:20 PM, Randolph Pullen wrote:

> Sorry, I meant to say:
> - on each node there is 1 listener and 1 worker.
> - all workers act together when any of the listeners send them a request.
> - currently I must use an extra clearinghouse process to receive from any of the listeners and bcast to workers, this is unfortunate because of the potential scaling issues
> I think you have answered this in that I must wait for MPI-3's non-blocking collectives.

Yes and no.  If each worker starts N non-blocking broadcasts just to be able to test for completion of any of them, you might end up consuming a bunch of resources for them (I'm *anticipating* that pending non-blocking collective requests maybe more heavyweight than pending non-blocking point-to-point requests).

But then again, if N is small, it might not matter.

> Can anyone suggest another way?  I don't like the serial clearinghouse approach.

If you only have a few workers and/or the broadcast message is small and/or the broadcasts aren't frequent, then MPI's built-in broadcast algorithms might not offer much more optimization than doing your own with point-to-point mechanisms.  I don't usually recommend this, but it may be possible for your case.

Jeff Squyres
For corporate legal information go to: