Are you looking at this ?
On 8/29/07 9:56 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
> On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote:
>> Is this trunk or 1.2?
> Oops. I should read more carefully :) This is trunk.
>> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
>>> I have a program that does a simple bucket brigade of sends and receives
>>> where rank 0 is the start and repeatedly sends to rank 1 until a certain
>>> amount of time has passed and then it sends and all done packet.
>>> Running this under np=2 always works. However, when I run with greater
>>> than 2 using only the SM btl the program usually hangs and one of the
>>> processes has a long stack that has a lot of the following 3 calls in it:
>>>  opal_progress(), line 187 in "opal_progress.c"
>>>  mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
>>>  mca_bml_r2_progress(), line 110 in "bml_r2.c"
>>> When stepping through the ompi_fifo_write_to_head routine it looks like
>>> the fifo has overflowed.
>>> I am wondering if what is happening is rank 0 has sent a bunch of
>>> messages that have exhausted the
>>> resources such that one of the middle ranks which is in the process of
>>> sending cannot send and therefore
>>> never gets to the point of trying to receive the messages from rank 0?
>>> Is the above a possible scenario or are messages periodically bled off
>>> the SM BTL's fifos?
>>> Note, I have seen np=3 pass sometimes and I can get it to pass reliably
>>> if I raise the shared memory space used by the BTL. This is using the
>>> devel mailing list
>> devel mailing list
> devel mailing list