Attached are the following graphs:
1. sm NetPipe latencies up to size 150 bytes (run on a Sandy Bride, 2 procs same core)
2. openib NetPipe latencies up to size 150 bytes (run on 2 old Xeons [pre-Nehalem] with old Mellanox ConnectX IB HCAs)
3. Same as #1, but all the way up to 8MB
4. Same as #2, but all the way up to 8MB
I also attached a tarball of all my raw net pipe numbers (since the graphs are loglog).
There's definite weirdness here. Here's some observations:
a) Trunk openib latency is noticeably better in the mid-range as compared to v1.6 and v1.7. This is good! Is this change something that can be brought to v1.6 / v1.7?
b) The addition of the libnbc progress function to the progress loop has a non-zero impact on latency. It's most noticeable in graphs #1 and #2. Can something be done to only add the libnbc progress function to the loop only when NBC operations are ongoing? Right now, the libnbc progress function is *always* added to the progress loop, even if you never use any NBCs.
c) There's a noticeable increase in small message latency for the openib BTL in v1.7 as compared to the trunk and v1.6 branches. I don't know if this is an openib thing, or the result of something else.
d) The trunk (without libnbc) has the best small message sm latency, period -- even better than v1.6. Yay! Is this decrease in latency (compared to v1.6) something that can be brought to v1.7?
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/