I have a python module written in C++ to help users manipulate a huge
amount of genetics data. Using this module, users can write a script
to create/load/manipulate data easily. For efficiency and memory
management reasons, I would like to write a MPI version of the module
so that I can spread the data to other machines.
I have some experience with MPI-1 so I started with the conventional
design. That is to say, a fixed number of nodes are started and
execute the same script. The data is split across nodes but all nodes
can read/write data as if the data is local. That is to say, write
operation is done on one of the nodes that has that piece of data, and
results of read operation are broadcasted so that they appear to be
local to all the nodes. The broadcast is needed to ensure identical
execution logic of the script on all nodes.
Although a test module is up and running, making sure all scripts
*see* the same data and execute the same script has proven to be very
inefficient and difficult. For example, if a script perform some
action based on a local random number, different nodes would probably
be out of sync.
I am thinking of an implementation in which only the head node
executes the script. It creates the slave nodes and asks them to act
on their local data if needed. RMA can be used so that the head node
can access data from slave nodes directly. This looks like an
efficient solution but I am not sure how to instruct the slave nodes
on what they should do. I mean, it is difficult to tell a slave node
to execute a certain function with such and such parameters. Treating
slave nodes as memory storage and use RMA for all the operations does
not sound like a good idea either.
I have been evaluating different approaches and have not decided which
way to do. I would highly appreciate any advise on how to design and
implement such a module.
Many thanks in advance.