George Bosilca wrote:
> We had a discussion about this few weeks ago. I have a version that
> modify this behavior (SM progress will not return as long as there are
> pending acks). There was no benefit from doing so (even if one might
> think that less calls to opal_progress might improve the performances).
But my concern is not the raw performance of MPI_Iprobe in this case but
more of an interaction between MPI and an application. The concern is
if it takes 2 MPI_Iprobes to get to the real message (instead of one)
then could this induce a synchronization delay in an application? That
is by the application not receiving the "real" message in the first
MPI_Iprobe it may decide to do other work while the other processes
are potentially blocked waiting for it to do some communications.
> In fact TCP has the potential to exhibit the same behavior. However,
> TCP after each successful poll it empty the socket, so it might read
> more than one message. As we have to empty the temporary buffer, we
> interpret most of the messages inside, and this is why TCP exhibit a
> different behavior.
I guess this difference in behavior between the SM BTL and TCP BTL is
disturbing to me. Does just processing one fifo entry per sm_progress
call per connection buying us performance? Would draining the acks be
detrimental to performance? Wouldn't providing the messages at the time
they arrived meet the rule of obviousness to application writers?
I know there is a slippery slope here of saying ok you've read one
message should read more until there is none on the fifo. I believe
that is really debatable and could go either way depending on the
application. But ack messages are not visible to the users. Which is
why I was only asking about draining the ack packets.
> On Jun 19, 2008, at 2:16 PM, Terry Dontje wrote:
>> Galen, George and others that might have SM BTL interest.
>> In my quest of looking at MPI_Iprobe performance I found what I think
>> is an issue. If you have an application that is using the SM BTL and
>> does a small message send <=256 followed by an MPI_Iprobe the
>> mca_btl_sm_component function that is eventually called as a result
>> of the opal_progress will receive and ack message from its send and
>> then return. The net affect is that the real message is after the
>> ack message doesn't get read until a second MPI_Iprobe is made.
>> It seems to me that mca_btl_sm_component should read all Ack messages
>> from a particular fifo until it either finds a real send fragment or
>> no more messages on the fifo. Otherwise, we are forcing calls like
>> MPI_Iprobe to not return messages that are really there. I am not
>> sure by IB but I know that the TCP BTL does not show this issue
>> (which doesn't surprise me since I imagine the BTL is relying on TCP
>> to handle this type of protocol stuff).
>> Before I go munging with the code I wanted to make sure I am not
>> overlooking something here. One concern is if I change the code to
>> drain all the ack messages is that going to disrupt performance
>> devel mailing list
> devel mailing list