I'm currently working on optimistic message logging and I would like to
implement an optimistic message logging protocol in OpenMPI. Optimistic
message logging protocols piggyback information about dependencies
between processes on the application messages to be able to find a
consistent global state after a failure. That's why I'm interested in
the problem of piggybacking information on MPI messages.
Is there some works on this problem at the moment ?
Has anyone already implemented some mechanisms in OpenMPI to piggyback
data on MPI messages?
Oleg Morajko wrote:
> I'm developing a causality chain tracking library and need a mechanism
> to attach an extra data to every MPI message, so called piggyback
> As far as I know there are a few solutions to this problem from which
> the two fundamental ones are the following:
> * Dynamic datatype wrapping - if a user MPI_Send, let's say 1024
> doubles, the wrapped send call implementation dynamically
> creates a derived datatype that is a structure composed of a
> pointer to 1024 doubles and extra fields to be piggybacked. The
> datatype is constructed with absolute addresses to avoid copying
> the original buffer. The receivers side creates the equivalent
> datatype to receive the original data and extra data. The
> performance of this solution depends on the how good is derived
> data type handling, but seems to be lightweight.
> * Sending extra data in a separate message -- seems this can have
> much more significant overhead
> Do you know any other portable solution?
> I have implemented the first solution for P2P operations and it works
> pretty well. However there are problems with collective operations.
> There are 2 classes of collective calls that are problematic:
> 1. Single receiver calls, like MPI_Gather. The sender tasks in
> gather can be handled in the same way as a normal send, a data
> item is wrapped and extra data is piggybacked with the message.
> The problem is at the receiver side when a root gathers N data
> items that must be received in an array big enough to receive
> all items strided by datatype extent.
> In particular, it seems impossible to construct a datatype that
> contains data item and extra data (i.e. structure type with
> absolute addresses) AND make an array of these datatypes
> separated by a fixed extent. For example: data item to receive
> from every process is a vector of 1024 doubles. Extra data is a
> single integer. User provides a receive buffer with place for N
> * 1024 * double. The library allocates an array of N integers to
> receive piggybacked data. How to construct a datatype that can
> be used to receive data in MPI_Gather?
> 2. MPI_Reduce calls. There is no problem with datatypes as the
> receiver gets the single data item and not an array as in
> previous case. The problem is the reduction operator itself
> (MPI_Op) because these operators do not work with wrapped data
> types. So I can create a new operator to recognize the wrapped
> data type that extracts the original data (skipping extra data)
> and performs the original reduction. The point is how to invoke
> the original reduction on an existing datatype. I have found
> that Open MPI calls internally ompi_op_reduce(op, inbuf, rbuf,
> count, dtype) this solves a problem. However this makes the code
> MPI-implementation dependent. Any idea on more portable options?
> Thank you in advance for any comment.
> users mailing list