Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: OB1 optimizations
From: Nathan Hjelm (hjelmn_at_[hidden])
Date: 2014-01-07 18:44:50


What: Push some ob1 optimizations to the trunk and 1.7.5.

What: This patch contains two optimizations:

  - Introduce a fast send path for blocking send calls. This path uses
    the btl sendi function to put the data on the wire without the need
    for setting up a send request. In the case of btl/vader this can
    also avoid allocating/initializing a new fragment. With btl/vader
    this optimization improves small message latency by 50-200ns in
    ping-pong type benchmarks. Larger messages may take a small hit in
    the range of 10-20ns.

  - Use a stack-allocated receive request for blocking recieves. This
    optimization saves the extra instructions associated with accessing
    the receive request free list. I was able to get another 50-200ns
    improvement in the small-message ping-pong with this optimization. I
    see no hit for larger messages.

When: These changes touch the critical path in ob1 and are targeted for
1.7.5. As such I will set a moderately long timeout. Timeout set for
next Friday (Jan 17).

Some results from osu_latency on haswell:

hjelmn_at_cn143 pt2pt]$ mpirun -n 2 --bind-to core -mca btl vader,self ./osu_latency
# OSU MPI Latency Test v4.0.1
# Size Latency (us)
0 0.11
1 0.14
2 0.14
4 0.14
8 0.14
16 0.14
32 0.15
64 0.18
128 0.36
256 0.37
512 0.46
1024 0.56
2048 0.80
4096 1.12
8192 1.68
16384 2.98
32768 5.10
65536 8.12
131072 14.07
262144 25.30
524288 47.40
1048576 91.71
2097152 195.56
4194304 487.05

Patch Attached.

-Nathan





  • application/pgp-signature attachment: stored