Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Terry D. Dontje (Terry.Dontje_at_[hidden])
Date: 2007-08-29 09:58:46


Trunk.

--td
Gleb Natapov wrote:

>Is this trunk or 1.2?
>
>On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
>
>
>>I have a program that does a simple bucket brigade of sends and receives
>>where rank 0 is the start and repeatedly sends to rank 1 until a certain
>>amount of time has passed and then it sends and all done packet.
>>
>>Running this under np=2 always works. However, when I run with greater
>>than 2 using only the SM btl the program usually hangs and one of the
>>processes has a long stack that has a lot of the following 3 calls in it:
>>
>> [25] opal_progress(), line 187 in "opal_progress.c"
>> [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
>> [27] mca_bml_r2_progress(), line 110 in "bml_r2.c"
>>
>>When stepping through the ompi_fifo_write_to_head routine it looks like
>>the fifo has overflowed.
>>
>>I am wondering if what is happening is rank 0 has sent a bunch of
>>messages that have exhausted the
>>resources such that one of the middle ranks which is in the process of
>>sending cannot send and therefore
>>never gets to the point of trying to receive the messages from rank 0?
>>
>>Is the above a possible scenario or are messages periodically bled off
>>the SM BTL's fifos?
>>
>>Note, I have seen np=3 pass sometimes and I can get it to pass reliably
>>if I raise the shared memory space used by the BTL. This is using the
>>trunk.
>>
>>
>>--td
>>
>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>--
> Gleb.
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>