Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Remote key sizes
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-11-08 19:25:27


On Nov 8, 2011, at 10:36 , Nathan T. Hjelm wrote:

> On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart <rvandevaart_at_[hidden]>
> wrote:
>>> george.
>>>
>>> PS: Regarding the hand-copy instead of the memcpy, we tried to avoid
>> using
>>> memcpy in performance critical codes, especially when we know the size of
>>> the data and the alignment. This relieves the compiler of adding ugly
>> intrinsics,
>>> allowing it to nicely pipeline to load/stores. Anyway, with both
>> approaches
>>> you will copy more data than needed for all BTLs except uGNI.
>>
>> I was looking at a case in a BTL I was working on where I actually need
> 64
>> bytes (yes, bytes) as the remote key size as opposed to the current 16
>> bytes (128 bits).
>> Not sure how I can handle that yet. (I assume configure is my friend,
> but
>> even in that case, all headers will need to carry around the extra data.)
>>
>
> I have been thinking about this a little bit. What I think should be done
> (and I am sure George will disagree) is to allow BTLs to define how long a

Well, I'm really sorry to deceive you …

> segment is. The PML would then just memcpy the segments into the send
> buffer (instead of copying each member).

The only valid reason I can find now for having the seg_key as it is defined today is code simplicity. Read below you will understand.

Otherwise I completely agree with you, the seg_key is something belonging to the BTLs, and all knowledge about should be limited to the BTLs (aka PML should just move it around). The solution you propose make sense…

However, there are few things that I think make it more challenging to implement that it looks.

1. endianess: Apparently the BTL is already responsible of storing the key in network order, as no translation is done on the key in the PMLs. As I don't think any of them do, I will assume this is already [somehow] taken care of.

2. one sided: A quick look in the OSC seems to indicate there are some special handling to be done in the RDMA one. Look at ompi_osc_rdma_sendreq_t in osc_rdma_sendreq.h, it is using a trick to store the remote segments. First, the mca_btl_base_segment_t are stored at the end of the structure, in order to allow for dynamic allocation. Second, OSC doesn't seems to manipulate pointers to mca_btl_base_segment_t, but the content itself. I didn't went too deep here, but I think particular attention should be payed to OSC.

3. PML. In addition to seg_len we use the seg_addr field extensively all over the code base, so it should be exposed in the mca_btl_base_segment_t as well.

4. How do we keep the capability of dealing with multiple mca_btl_base_segment_t? Just imagine how the macro MCA_PML_OB1_COMPUTE_SEGMENT_LENGTH will look like…

Everything else should be quite trivial ;)

  george.

> For example mca_btl_base_segment_t would become:
>
> struct mca_btl_base_segment_t {
> size_t seg_len;
> };
>
> since the pml needs the segment size (it does not need anything else).
>
> and then each btl would define its own segment like:
> struct mca_btl_ugni_segment_t {
> struct mca_btl_base_segment_t base;
> gni_mem_handle_t seg_key;
> };
>
> and we would add:
> size_t btl_segment_len;
>
> to the mca_btl_base_module_t or the base frag so the pml knows how much it
> needs to copy.
>
> This design would address George's criticism of the length of the seg_key
> and also allow BTLs to do what they need to. It would require a memcpy but
> I disagree this would slow the critical path. Even if it does it would be
> relatively minor (i think) and the flexibility is worth more in the long
> run.
>
> -Nathan
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel