On May 26, 2008, at 5:17 PM, Matt Hughes wrote:
> With the TCP btl, when free list items are exhausted, OMPI 1.2.6 falls
> into an infinite loop:
> #3981 0x0000002a98b4e23f in opal_condition_wait (c=0x2a98c541d0,
> m=0x2a98c54180) at ../../../../opal/threads/condition.h:81
> The call used to get a free list item is OMPI_FREE_LIST_WAIT(), which
> is supposed to block until an item is available. However, it calls
> opal_condition_wait(), which in turn calls opal_process(), which then
> waits for a free list item..... It seems strange to me that
> opal_condition_wait() calls opal_progress(), but I'm not that familiar
> with the code.
We do that because OMPI is single-threaded. Otherwise, there's no
other way to make progress while waiting for the conditional variable
to become true.
> Is it possible that this has been fixed in 1.3?
It is possible -- there were some changes with regards to how free
list waiting was done, etc. Would it be possible to try your test
with a trunk nightly tarball?
> I haven't tried 1.3 yet because I will have to file a truckload of
> bugs against 1.3 first.
Do you have a truckload of bugs to file for v1.3? If so, now is the
time to do so -- we're gearing up for the v1.3 release...
> Should I be posting this stuff to the devel list?
If your questions go beyond the naieve-user-level questions, you might
get a quicker response on the devel list.