Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Pavel Shamis (Pasha) (pasha_at_[hidden])
Date: 2007-08-13 12:41:30


Brian Barrett wrote:
> On Aug 13, 2007, at 9:33 AM, George Bosilca wrote:
>
>> On Aug 13, 2007, at 11:28 AM, Pavel Shamis (Pasha) wrote:
>>
>>> Jeff Squyres wrote:
>>>> I guess reading the graph that Pasha sent is difficult; Pasha -- can
>>>> you send the actual numbers?
>>>>
>>> Ok here is the numbers on my machines:
>>> 0 bytes
>>> mvapich with header caching: 1.56
>>> mvapich without header caching: 1.79
>>> ompi 1.2: 1.59
>>>
>>> So on zero bytes ompi not so bad. Also we can see that header caching
>>> decrease the mvapich latency on 0.23
>>>
>>> 1 bytes
>>> mvapich with header caching: 1.58
>>> mvapich without header caching: 1.83
>>> ompi 1.2: 1.73
>>>
>>> And here ompi make some latency jump.
>>>
>>> In mvapich the header caching decrease the header size from 56bytes to
>>> 12bytes.
>>> What is the header size (pml + btl) in ompi ?
>>
>> The match header size is 16 bytes, so it looks like ours is already
>> optimized ...
>
> Pasha -- Is your build of Open MPI built with
> --disable-heterogeneous? If not, our headers all grow slightly to
> support heterogeneous operations. For the heterogeneous case, a 1
> byte message includes:
I didn't build with "--disable-heterogeneous". So the heterogeneous
support was enabled in the build
>
> 16 bytes for the match header
> 4 bytes for the Open IB header
> 1 byte for the payload
> ----
> 21 bytes total
>
> If you are using eager RDMA, there's an extra 4 bytes for the RDMA
> length in the footer. Without heterogeneous support, 2 bytes get
> knocked off the size of the match header, so the whole thing will be
> 19 bytes (+ 4 for the eager RDMA footer).
I used eager rdma - it is faster than send. So the message size on the
wire for 1 byte in my case was - 25bytes VS 13bytes in mvapich. And If
i will --disable-heterogeneous it will decrease 2 bytes. So it sound
like we are pretty optimized.

>
> There are also considerably more ifs in the code if heterogeneous is
> used, especially on x86 machines.
>
> Brian
>