Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Begginers question: why does this program hangs?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-03-18 10:51:25

On Mar 18, 2008, at 10:32 AM, George Bosilca wrote:

> Jeff hinted the real problem in his email. Even if the program use
> the correct MPI functions, it is not 100% correct.

I think we disagree here -- the sample program is correct according to
the MPI spec. It's an implementation artifact that makes it deadlock.

The upcoming v1.3 series doesn't suffer from this issue; we revamped
our transport system to distinguish between early and normal
completions. The pml_ob1_use_eager_completion MCA param was added to
v1.2.6 to allow correct MPI apps to avoid this optimization -- a
proper fix is coming in the v1.3 series.

> It might pass in some situations, but can lead to fake "deadlocks"
> in others. The problem come from the flow control. If the messages
> are small (which is the case in the test example), Open MPI will
> send them eagerly. Without a flow control, these messages will be
> buffered by the receiver, which will exhaust the memory on the
> receiver. Once this happens, some of the messages may get dropped,
> but the most visible result, is that the progress will happens very
> (VERY) slowly.

Your text implies that we can actually *drop* (and retransmit)
messages in the sm btl. That doesn't sound right to me -- is that
what you meant?

Jeff Squyres
Cisco Systems