If you have positive confirmation that such things have happened, this will
go a long way. I will not trust the code until this has also been done with
multiple independent network paths. I very rarely express such strong
opinions, even if I don't agree with what is being done, but this is the
core of correct MPI functionality, and first hand experience has shown that
just thinking through the logic, I can miss some of the race conditions.
The code here has been running for 8+ years in two production MPI's running
on very large clusters, so I am very reluctant to make changes for what
seems to amount to people's taste - maintenance is not an issue in this
case. Had this not been such a key bit of code, I would not even bat an
eye. I suppose if you can go through some formal verification, this would
also be good - actually better than hoping that one will hit out-of-order
On 12/14/07 2:20 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
> On Thu, Dec 13, 2007 at 06:16:49PM -0500, Richard Graham wrote:
>> The situation that needs to be triggered, just as George has mentions, is
>> where we have a lot of unexpected messages, to make sure that when one that
>> we can match against comes in, all the unexpected messages that can be
>> matched with pre-posted receives are matched. Since we attempt to match
>> only when a new fragment comes in, we need to make sure that we don't leave
>> other unexpected messages that can be matched in the unexpected queue, as
>> these (if the out of order scenario is just right) would block any new
>> matches from occurring.
>> For example: Say the next expect message is 25
>> Unexpected message queue has: 26 28 29 ..
>> If 25 comes in, and is handled, if 26 is not pulled off the unexpected
>> message queue, when 27 comes in it won't be able to be matched, as 26 is
>> sitting in the unexpected queue, and will never be looked at again ...
> This situation is triggered constantly with openib BTL. OpenIB BTL has
> two ways to receive a packet: over a send queue or over an eager RDMA path.
> Receiver polls both of them and may reorders packets locally. Actually
> currently there is a bug in openib BTL that one channel may starve the other
> at the receiver so if a match fragment with a next sequence number is in the
> starved path tenth of thousands fragment can be reorederd. Test case attached
> to ticket #1158 triggers this case and my patch handles all reordered packets.
> And, by the way, the code is much simpler now and can be review easily ;)
> devel mailing list