Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-07-16 10:43:19


On Jul 15, 2007, at 10:05 PM, Isaac Huang wrote:

> Hello, I read from the FAQ that current Open MPI releases don't
> support end-to-end data reliability. But I still have some confusing
> that can't be solved by googling or reading the FAQ:
>
> 1. I read from "MPI - The Complete Reference" that "MPI provides the
> user with reliable message transmission. A message sent is always
> received correctly, and the user does not need to check for
> transmission errors, timeouts, or other error conditions." But the
> standard is sort of vague about what exactly this "reliable message
> transmission" is. Does it at least require reliable delivery? Or, does
> Open MPI notice and re-transmit lost data?

Yes, the MPI standard guarantees message is reliably delivered in
order. MPI implementations have taken this to mean that if the
transport is "reliable", then the MPI doesn't have to do anything
special. So we assume that TCP delivers data into our headers
properly and same for shared memory, Myrinet, and InfiniBand (the RC
protocol, anyway). We also assume that any data sent arrives on the
other side.

We have an experimental point-to-point engine, DR, that provides
reliable transportation even for networks that have corruption and/or
packet loss. The engine isn't available in a stable release, as it
is still in the experimental phase. Checksums and timers are used to
detect message corruption and recover. This allows us to play with
non-reliable network protocols such as UDP or InfiniBand's UD protocol.

In truth, however, the reliability guaranteed by the transports
currently in use by Open MPI are more than enough to meet the needs
of almost all users. Most of the supported networks have some type
of error detection or correction that provides protection only
slightly statistically worse than what we could provide within Open
MPI, but at a much lower cost.

> 2. When a data corruption happens (in message data), is the data in
> the message envelop still reliable? Or, does Open MPI or the MPI
> standard guarantee data integrity of message envelops? I'm
> particularly interested in MPI_TAG which I use to encode things.

In my opinion, any guarantee that applies to the message applies to
the meta-data (tag, source, length) as well. The DR component will
provide the same level of protection to the headers as it does to the
payload.

Brian

-- 
   Brian W. Barrett
   Networking Team, CCS-1
   Los Alamos National Laboratory