On Apr 1, 2010, at 4:17 PM, Oliver Geisler wrote:
> > However, reading through your initial description on Tuesday, none of these
> > fit: You want to actually measure the kernel time on TCP communication costs.
> Since the problem occurs also on node only configuration and mca-option
> btl = self,sm,tcp is used, I doubt it has to do with TCP communication.
I'm not sure what to make of this remark. Why would the raw performance of TCP be irrelevant? Open MPI uses TCP over ethernet, so it can't be faster than TCP. More specifically: if something is making TCP slow, the MPI will be slow as well. From the times you've listed, it almost sounds like you're getting a lot of TCP drops and retransmits (the lengthy times could be timeouts). Can you check your NIC / switch hardware to see if you're getting drops?
Also, you should probably test raw ping-pong performance:
a) between 2 MPI processes on the same node. E.g.:
mpirun -np 2 --mca btl sm,self _your_favorite_benchmark_
This will test shared memory latency/bandwidth/whatever of MPI on that node.
b) between 2 MPI processes on different nodes
mpirun -np 2 --host cluster-06,cluster-07 --mca btl tcp,self _your_favorite_benchmark_
This will test TCP latency/bandwidth/whatever of MPI on that node.
Try NetPIPE -- it has both MPI communication benchmarking and TCP benchmarking. Then you can see if there is a noticable difference between TCP and MPI (there shouldn't be). There's also a "memcpy" mode in netpipe, but it's not quite the same thing as shared memory message passing.
For corporate legal information go to: