Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Eliminate ompi/class/ompi_[circular_buffer_]fifo.h
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-13 08:05:36


George -- can you confirm/deny? Is this something we need to fix for
v1.3.1?

On Feb 12, 2009, at 10:15 PM, Eugene Loh wrote:

> Got it, thanks.
>
> Is anyone else looking at that ticket? I'm still a newbie and I
> suspect someone else could figure this problem out a lot faster than
> I could. So, I'm curious how much I should be looking at this ticket.
>
> If amateurs are allowed to speculate, however, my guess is that this
> isn't really a BTL thing. It reminds me of trac ticket 1468 (aka
> 1516). In that case, there was a lot of one-way traffic. We needed
> a way to return frags to the sender. I guess that was solved.
>
> So, the present problem is something different. My guess is that
> senders are overrunning receivers. Could it be that some receiver
> (like the root in the MPI_Reduce) ends up with too many in-coming
> messages. It has to queue up unexpected messages, which slows it
> down further, which means it has to deal with even more unexpected
> messages, etc. Those messages have to be placed somewhere, which
> means memory is allocated, etc.?
>
> Just a theory. I don't know the PML well enough to judge its
> soundness.
>
> But if this is the case, it's a PML issue rather than a BTL issue.
> Maybe there should be some flow control -- particular in our
> implementation of collectives!
>
> Ralph Castain wrote:
>
>> The connection is only that, if you are going to modify the sm BTL
>> as you say, you might at least want to be aware that we have a
>> problem in it so you (a) don't make it worse than it already is,
>> and (b) might keep an eye open for the problem as you are changing
>> things.
>>
>> On Feb 12, 2009, at 3:58 PM, Eugene Loh wrote:
>>
>>> Sorry, what's the connection? Are we talking about https://svn.open-mpi.org/trac/ompi/ticket/1791
>>> ? Are you simply saying that if I'm doing some sm BTL work, I
>>> should also look at 1791? I'm trying to figure out if there's
>>> some more specific connection I'm missing.
>>>
>>> Ralph Castain wrote:
>>>
>>>> You might want to look at ticket #1791 while you are doing this
>>>> - Brad added some valuable data earlier today.
>>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems