Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: eliminating "descriptor" argument from sendi function
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-02-23 15:25:58

On Feb 23, 2009, at 12:14 , Eugene Loh wrote:

> I'm a newbie and George is a veteran. So, this feels rather like
> David and Goliath. (Hmm, David won and became king. Gee, I kinda
> like that.) Anyhow...

That's an old story, we're living in modern times now ;)

> George Bosilca wrote:
>> It doesn't sound reasonable to me. There is a reason for this, and
>> I think it's a good reason. The sendi function work for some
>> devices as a fast path for sending data, when the network is not
>> flooded. However, in the case sendi cannot do the job we expect,
>> the fact that it return the descriptor save us a call (we don't
>> have to do the alloc call later).
> This does not make any sense to me. In what sense are we "saving a
> call"? Not in the sense of run-time performance since the BTL is
> now having to allocate a descriptor it did not otherwise need. The
> amount of work is the same (one descriptor allocation either way),
> but you're just pushing that work into the BTLs.

The descriptor is a BTL resource. If the sendi doesn't return one, the
PML will have to call the BTL alloc function from the BTL again (in
this case the calls will look like this: btl_sendi followed by
btl_alloc followed by btl_send). I'm not looking only at SM, I want
all of the BTL to have the opportunity to get good performance.

If sendi return a descriptor when it fails to send the data the call
list will be shorter: btl_sendi followed by btl_send. I'm trying to
decrease the number of jumps between the layers (PML/BTL), not the
number of lines of code in the BTL.

> We are certainly not "saving a call" in the sense of reducing source
> code. The PML has to have code to allocate a descriptor anyway
> since there may not even be any sendi functions. So, the code to
> allocate the descriptor is already in the PML. By asking sendi
> functions to do the same, you're replicating that code in every
> sendi function... possibly multiple times per BTL since a sendi
> function might have multiple "out of resource" return paths.
>> Therefore, in the PML we already have the descriptor and we can
>> hand it back to the BTL, which give a chance for asynchronous
>> progress later on. Without this descriptor, the only option the
>> PML have is to put the PML request in a queue, and to try to send
>> it later, which is very expensive.
> This also makes no sense to me. We're not talking about doing
> without the descriptor. The PML is prepared to allocate it anyhow.
> The issue is where the descriptor is allocated in the case that
> sendi functions exist but cannot succeed. One alternative is to use
> a single allocation point in the PML. The other alternative (what
> we have today) is to replicate that code out to multiple sites,
> adding unnecessary source code and interface arguments.

As I said previously, this save one jump from the PML to the BTL by
adding one more return argument to the sendi function and some lines
of code in every BTL. Not a big deal as a correctly written BTL can do
it pretty smartly (as an example special return case where everybody
jumps when an error is detected).


> The PML code is in
> #mca_pml_ob1_send_request_start_copy
> Existing BTL sendi functions are at
> btl_sm.c#mca_btl_sm_sendi
> btl_mx.c#mca_btl_mx_sendi
> #mca_btl_portals_sendi
> _______________________________________________
> devel mailing list
> devel_at_[hidden]