Nifty Tom Mitchell wrote:
I'm unclear what you're asking about. Are you asking that a BTL would
limit the performance delivered to the application? E.g., the
interconnect is capable of 1 Gbyte/sec, but you only deliver 100
Mbyte/sec (or whatever the user selects) to the app so the user can see
whether bandwidth is a sensitive parameter for the app?
On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote:
Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing
allMPI traffic over Ethernet instead of using Infiniband
While the previous thread on "performance reduction" went left, right,
forward and beyond the initial topic it tickled an idea for application
profiling or characterizing.
What if the various transports (btl) had knobs that permitted stepwise
insertion of bandwidth limits and latency limits etc. so the application
might be characterized better?
If so, I have a few thoughts.
1) The actual limitations of an MPI implementation may hard to model.
E.g., the amount of handshaking between processes, synchronization
2) For the most part, you could (actually even should) try doing this
stuff much higher up than the BTLs. E.g., how about developing a PMPI
layer that does what you're talking about.
3) I think folks have tried this sort of thing in the past by
instrumenting the code and then "playing it back" or "simulating" with
other performance parameters. E.g., "I run for X cycles, then I send a
N-byte message, then compute another Y cycles, then post a receive,
then ..." and then turn the knobs for latency, bandwidth, etc., to see
at what point any of these become sensitive parameters. You might
see: gosh, as long as latency is lower than about 30-70 usec, it
really isn't important. Or, whatever. Off hand, I think different
people have tried this approach and (without bothering to check my
notes to see if my memory is any good) I think Dimemmas (associated
with Paraver and CEPBA Barcelona) was one such tool.
Most micro benchmarks are designed to measure various hardware characteristics
but it is moderately hard to know what an application depends on.
The value of this is that:
*the application authors might learn something
about their code that is hard to know at a well
abstracted API level.
*the purchasing decision maker would have the ability
to access a well instrumented cluster and build a
weighted value equation to help structure the decision.
*the hardware vendor can learn what is valuable when deciding
what feature and function needs the most attention/ transistors.
i.e. it might be as valuable to benchmark "your code" on a single well
instrumented platform as it might be to benchmark all the hardware you
can get "yer hands on".