Have you looked at the self-scheduling algorithm described in "USING
MPI" by Gropp, Lusk, and Skjellum. I have seen efficient
implementations of it for large satellite data assimilation problems in
numerical weather prediction, where load distribution across processors
cannot be predicted in advance. It is somewhat analogous to similar
algorithms in OPENMP, where the number of 'tasks' is significantly
larger than the number of processors.
On Sat, 2007-02-03 at 23:48 -0600, Bo Peng wrote:
> Dear list,
> I have a python module written in C++ to help users manipulate a huge
> amount of genetics data. Using this module, users can write a script
> to create/load/manipulate data easily. For efficiency and memory
> management reasons, I would like to write a MPI version of the module
> so that I can spread the data to other machines.
> I have some experience with MPI-1 so I started with the conventional
> design. That is to say, a fixed number of nodes are started and
> execute the same script. The data is split across nodes but all nodes
> can read/write data as if the data is local. That is to say, write
> operation is done on one of the nodes that has that piece of data, and
> results of read operation are broadcasted so that they appear to be
> local to all the nodes. The broadcast is needed to ensure identical
> execution logic of the script on all nodes.
> Although a test module is up and running, making sure all scripts
> *see* the same data and execute the same script has proven to be very
> inefficient and difficult. For example, if a script perform some
> action based on a local random number, different nodes would probably
> be out of sync.
> I am thinking of an implementation in which only the head node
> executes the script. It creates the slave nodes and asks them to act
> on their local data if needed. RMA can be used so that the head node
> can access data from slave nodes directly. This looks like an
> efficient solution but I am not sure how to instruct the slave nodes
> on what they should do. I mean, it is difficult to tell a slave node
> to execute a certain function with such and such parameters. Treating
> slave nodes as memory storage and use RMA for all the operations does
> not sound like a good idea either.
> I have been evaluating different approaches and have not decided which
> way to do. I would highly appreciate any advise on how to design and
> implement such a module.
> Many thanks in advance.
> users mailing list