Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] calling sendi earlier in the PML
From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2009-03-03 15:59:21


On Tue, 3 Mar 2009, Jeff Squyres wrote:

> On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote:
>
>> First, this behavior is basically what I was proposing and what George
>> didn't feel comfortable with. It is arguably no compromise at all. (Uggh,
>> why must I be so honest?) For eager messages, it favors BTLs with sendi
>> functions, which could lead to those BTLs becoming overloaded. I think
>> favoring BTLs with sendi for short messages is good. George thinks that
>> load balancing BTLs is good.
>>
>> Second, the implementation can be simpler than you suggest:
>>
>> *) You don't need a separate list since testing for a sendi-enabled BTL is
>> relatively cheap (I think... could verify).
>> *) You don't need to shuffle the list. The mechanism used by ob1 just
>> resumes the BTL search from the last BTL used. E.g., check
>> https://svn.open-mpi.org/source/xref/ompi_1.3/ompi/mca/pml/ob1/pml_ob1_sendreq.h#mca_pml_ob1_send_request_start
>> . You use mca_bml_base_btl_array_get_next(&btl_eager) to roundrobin over
>> BTLs in a totally fair manner (remembering where the last loop left off),
>> and using mca_bml_base_btl_array_get_size(&btl_eager) to make sure you
>> don't loop endlessly.
>
> Cool / fair enough.
>
> How about an MCA parameter to switch between this mechanism (early sendi) and
> the original behavior (late sendi)?
>
> This is the usual way that we resolve "I want to do X / I want to do Y"
> disputes. :-)

Of all the options presented, this is the one I dislike most :).

This is *THE* critical path of the OB1 PML. It's already horribly complex
and hard to follow (as Eugene is finding out the hard way). Making it
more complex as a way to settle this argument is pain and suffering just
to avoid conflict.

However, one possible option that just occurred to me. I propose yet
another option. If (AND ONLY IF) ob1/r2 detects that there are at least
two BTLs to the same peer at the same priority and at least one has a
sendi and at least one does not have a sendi, what about an MCA parameter
to disable all sendi functions to that peer?

There's only a 1% gain in the FAIR protocol Euegene proposed, so we'd lose
that 1% in the heterogeneous multi-nic case (the least common case).
There would be a much bigger gain for the sendi homogeneous multi-nic /
all single-nic cases (much more common), because the FAST protocol would
be used.

That way, we get the FAST protocol in all cases for sm, which is what I
really want ;).

Brian