Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Infinite loop when tcp free list max reached
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-04 13:18:35


On May 26, 2008, at 5:17 PM, Matt Hughes wrote:

> With the TCP btl, when free list items are exhausted, OMPI 1.2.6 falls
> into an infinite loop:
>
> #3981 0x0000002a98b4e23f in opal_condition_wait (c=0x2a98c541d0,
> m=0x2a98c54180) at ../../../../opal/threads/condition.h:81

[snip]

Yoinks.

> The call used to get a free list item is OMPI_FREE_LIST_WAIT(), which
> is supposed to block until an item is available. However, it calls
> opal_condition_wait(), which in turn calls opal_process(), which then
> waits for a free list item..... It seems strange to me that
> opal_condition_wait() calls opal_progress(), but I'm not that familiar
> with the code.

We do that because OMPI is single-threaded. Otherwise, there's no
other way to make progress while waiting for the conditional variable
to become true.

> Is it possible that this has been fixed in 1.3?

It is possible -- there were some changes with regards to how free
list waiting was done, etc. Would it be possible to try your test
with a trunk nightly tarball?

     http://www.open-mpi.org/nightly/trunk/

> I haven't tried 1.3 yet because I will have to file a truckload of
> bugs against 1.3 first.

Do you have a truckload of bugs to file for v1.3? If so, now is the
time to do so -- we're gearing up for the v1.3 release...

> Should I be posting this stuff to the devel list?

If your questions go beyond the naieve-user-level questions, you might
get a quicker response on the devel list.

-- 
Jeff Squyres
Cisco Systems