On Aug 12, 2007, at 3:49 PM, Gleb Natapov wrote:
>> - Mellanox tested MVAPICH with the header caching; latency was around
>> - Mellanox tested MVAPICH without the header caching; latency was
>> around 1.9us
> As far as I remember Mellanox results and according to our testing
> difference between MVAPICH with header caching and OMPI is 0.2-0.3us.
> Not 0.5us. And MVAPICH without header caching is actually worse then
> OMPI for small messages.
I guess reading the graph that Pasha sent is difficult; Pasha -- can
you send the actual numbers?
>> Given that OMPI is the lone outlier around 1.9us, I think we have no
>> choice except to implement the header caching and/or examine our
>> header to see if we can shrink it. Mellanox has volunteered to
>> implement header caching in the openib btl.
> I think we have a chose. Not implement header caching, but just
> change the
> osu_latency benchmark to send each message with different tag :)
If only. :-)
But that misses the point (and the fact that all the common ping-pong
benchmarks use a single tag: NetPIPE, IMB, osu_latency, etc.). *All
other MPI's* give us latency around 1.4us, but Open MPI is around
1.9us. So we need to do something.
Are we optimizing for a benchmark? Yes. But we have to do it. Many
people know that these benchmarks are fairly useless, but not enough
-- too many customers do not, and education is not enough. "Sure
this MPI looks slower but, really, it isn't. Trust me; my name is
Joe Isuzu." That's a hard sell.
> I am not against header caching per se, but if it will complicate code
> even a little bit I don't think we should implemented it just to
> benefit one
> fabricated benchmark (AFAIR before header caching was implemented in
> MVAPICH mpi_latency actually sent messages with different tags).
That may be true and a reason for us to wail and gnash our teeth, but
it doesn't change the current reality.
> Also there is really nothing to cache in openib BTL. Openin BTL
> header is 4
> bytes long. The caching will have to be done in OB1 and there it will
> affect every other interconnect.
Surely there is *something* we can do -- what, exactly, is the
objection to peeking inside the PML header down in the btl? Is it
really so horrible for a btl to look inside the upper layer's
header? I agree that the PML looking into a btl header would
[obviously] be Bad.
All this being said -- is there another reason to lower our latency?
My main goal here is to lower the latency. If header caching is
unattractive, then another method would be fine.