In this case, creating all those communicators really doesn't buy you anything since you aren't using any collective operations across all the subgroups you would be creating.
For this sort of course-grained parallelism, your best bet is probably a master/slave (producer/consumer, worker-pool) model. Have one process (master) generate valid sets for your first X (of Y total) parameters. The master then sends a unique set of these parameters to each slave process. Each slave generates all possible sets of the remaining parameters, evaluates the function for those parameter sets, stores the local max/min and returns this value to the master. Upon receiving the max/min from the slave, the master compares this to the global max/min and sends the slave a new set of the first X parameters. Repeat until the master has sent all possible sets of X parameters and all slaves have processed all their work.
Looking at it as a tree, the master process traverses the top of the tree, handing each slave a branch and letting the slave traverse the remainder of the tree. For load balancing, you want a lot more branches than you want slaves so that each slave is always kept busy. But you also want enough work for each slave to where they are not constantly communicating with the master asking for the next set of parameters. This is done by adjusting the depth to which the master process traverses the parameter tree.
Hope this helps. Good luck.
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Hicham Mouline
> Sent: Tuesday, November 23, 2010 3:56 PM
> To: 'Open MPI Users'
> Subject: Re: [OMPI users] MPI_Comm_split
> > -----Original Message-----
> > From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
> > Behalf Of Bill Rankin
> > Sent: 23 November 2010 19:32
> > To: Open MPI Users
> > Subject: Re: [OMPI users] MPI_Comm_split
> > Hicham:
> > > If I have a 256 mpi processes in 1 communicator, am I able to split
> > > that communicator, then again split the resulting 2 subgroups, then
> > > again the resulting 4 subgroups and so on, until potentially having
> > 256
> > > subgroups?
> > You can. But as the old saying goes: "just because you *can* do
> > something doesn't necessarily mean you *should* do it." :-)
> > What is your intent in creating all these communicators?
> > > Is this insane in terms of performance?
> > Well, how much "real" work are you doing? Operations on
> > are collectives, so they are expensive. However if you do this only
> > once at the beginning of something like a three-week long simulation
> > run then you probably won't notice the impact.
> > In any case, I suspect there is a better way.
> > -bill
> I have need for a parallel parameter sweep. I have arguments x0 to x9
> say of
> a function.
> I need to evaluate this function for every acceptable combination of
> This list of acceptable combinations forms what I can view as a tree:
> . under the root node, all possible values of x0 (say there are 10 of
> x0_0 to x0_9)
> . under each of these nodes, all possible values of x1 that agree with
> args defined so far, for .e.g
> if x1_0 is not possible with x0_0, then it's not part of the tree...
> . and so on until reaching the leaf nodes. At those nodes, I evaluate
> function and I want the global maximum and/or minimum.
> the order of magnitude is 128 for the depth of the tree, and 100
> values for each x.
> each eval takes a couple of ms though.
> I thought this facility of splitting communicators maps nicely the
> nature of
> my problem.
> what do you think?
> I'm actually not exactly sure how I'm gonna do it, but wished to have
> opinion about whether it's just crazy
> users mailing list