You probably are not seeing overhead costs so much as you are seeing
the difference between using send/recv for small messages, which Open
MPI uses, and RDMA for small messages. If you are comparing against
another implementation that uses RDMA for small messages then yes, you
will see lower latencies, but there are issues with using small message
RDMA. I have written a paper that addresses these issues which will be
presented at IPDPS.
The short of it:
small message RDMA is effective for a small number of peers but polling
costs begin to dominate wire time as the number of peers increases.
Try comparing our latencies with another MPI that uses send/receive, if
you are using MVAPICH you can use a compile time flag to disable small
message RDMA and force send/receive, our results show lower latencies
for send/receive semantics which indicates that the Open MPI framework
costs are lower.
We are looking at small message RDMA for Open MPI but as this is
primarily an optimization for very small clusters and micro-benchmarks,
the benefit for real applications may be nil.
On Feb 8, 2006, at 12:58 PM, Jean-Christophe Hugly wrote:
> Hi guys,
> Does someone know what the framework costs in term of latency ?
> Righ now the latency I get with the openib btl is not great: 5.35 us. I
> was looking at what I could do to get it down. I tried to get openib be
> the only btl but the build process refused.
> On the other hand I am not sure it could even work at all, as whenever
> tried at run-time to limit the list to just one transport (be it tcp or
> openib, btw), mpi apps would not start.
> Either way, I'm curious if it's even worth trying and if there's other
> cuts that can be made to shave off one us or two (ok, I'l settle for
> 1.5 :-) )
> Any advice ?
> Jean-Christophe Hugly <jice_at_[hidden]>
> users mailing list