Re: [OMPI devel] OMPI vs Scali performance comparisons
I still think that the pml fast path fixes would be
As do I. Again, I think one needs to go to the BTL sendi as soon as
possible after entering the PML, which raised those thorny discussions
about how exactly we must preserve current BTL ordering.
To catch up to Scali, we need 2x. The PML fast-path stuff I showed in
Feb in San Jose produced 30%-3x, depending on the hardware (that is,
depending on whether the bottleneck is memory performance or
instruction-processing speed). I don't know where this particular
hardware sits in that range.