Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Profiling performance by forcing transport choice.
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-07-23 12:43:57

Nifty Tom Mitchell wrote:
On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote:
Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing
	allMPI traffic over Ethernet instead of using Infiniband

While the previous thread on "performance reduction" went left, right,
forward and beyond the initial topic it tickled an idea for application
profiling or characterizing.

What if the various transports (btl) had knobs that permitted stepwise 
insertion of bandwidth limits and latency limits etc. so the application
might be characterized better?
I'm unclear what you're asking about.  Are you asking that a BTL would limit the performance delivered to the application?  E.g., the interconnect is capable of 1 Gbyte/sec, but you only deliver 100 Mbyte/sec (or whatever the user selects) to the app so the user can see whether bandwidth is a sensitive parameter for the app?

If so, I have a few thoughts.

1)  The actual limitations of an MPI implementation may hard to model.  E.g., the amount of handshaking between processes, synchronization delays, etc.

2)  For the most part, you could (actually even should) try doing this stuff much higher up than the BTLs.  E.g., how about developing a PMPI layer that does what you're talking about.

3)  I think folks have tried this sort of thing in the past by instrumenting the code and then "playing it back" or "simulating" with other performance parameters.  E.g., "I run for X cycles, then I send a N-byte message, then compute another Y cycles, then post a receive, then ..." and then turn the knobs for latency, bandwidth, etc., to see at what point any of these become sensitive parameters.  You might see:  gosh, as long as latency is lower than about 30-70 usec, it really isn't important.  Or, whatever.  Off hand, I think different people have tried this approach and (without bothering to check my notes to see if my memory is any good) I think Dimemmas (associated with Paraver and CEPBA Barcelona) was one such tool.
Most micro benchmarks are designed to measure various hardware characteristics
but it is moderately hard to know what an application depends on.

The value of this is that:
	*the application authors might learn something
	about their code that is hard to know at a well 
	abstracted API level.

	*the purchasing decision maker would have the ability 
	to access a well instrumented cluster and build a 
	weighted value equation to help structure the decision.

	*the hardware vendor can learn what is valuable when deciding
	what feature and function needs the most attention/ transistors.

i.e. it might be as valuable to benchmark "your code" on a single well
instrumented platform as it might be to benchmark all the hardware you 
can get "yer hands on".