Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Begginers question: why does this program hangs?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-03-18 10:51:25


On Mar 18, 2008, at 10:32 AM, George Bosilca wrote:

> Jeff hinted the real problem in his email. Even if the program use
> the correct MPI functions, it is not 100% correct.

I think we disagree here -- the sample program is correct according to
the MPI spec. It's an implementation artifact that makes it deadlock.

The upcoming v1.3 series doesn't suffer from this issue; we revamped
our transport system to distinguish between early and normal
completions. The pml_ob1_use_eager_completion MCA param was added to
v1.2.6 to allow correct MPI apps to avoid this optimization -- a
proper fix is coming in the v1.3 series.

> It might pass in some situations, but can lead to fake "deadlocks"
> in others. The problem come from the flow control. If the messages
> are small (which is the case in the test example), Open MPI will
> send them eagerly. Without a flow control, these messages will be
> buffered by the receiver, which will exhaust the memory on the
> receiver. Once this happens, some of the messages may get dropped,
> but the most visible result, is that the progress will happens very
> (VERY) slowly.

Your text implies that we can actually *drop* (and retransmit)
messages in the sm btl. That doesn't sound right to me -- is that
what you meant?

-- 
Jeff Squyres
Cisco Systems