Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-23 20:53:43

On Jul 23, 2007, at 6:43 AM, Biagio Cosenza wrote:

> I'm working on a parallel real time renderer: an embarassing
> parallel problem where latency is the threshold to high perfomance.
> Two observations:
> 1) I did a simple "ping-pong" test (the master does a Bcast + an
> IRecv for each node + a Waitall) similar to effective renderer
> workload. Using a cluster of 37 nodes on Gigabit Ethernet, seems
> that the latency is usually low (about 1-5 ms), but sometimes there
> are some peaks of about 200 ms. I thought that the cause is a
> packet retransmission in one of the 37 connections, that blow the
> overall performance of the test (of course, the final WaitAll is a
> synch).
> 2) A research team argues in a paper that MPI suffers on
> dynamically manage latency. They also arguing an interesting
> problem about enable/disable Nagle algorithm. (I paste the
> interesting paragraph below)
> So I have two questions:
> 1) Why my test have these peaks? How can I afford them (I think to
> btl tcp params)?

They are probably beyond Open MPI's control -- OMPI mainly does read
() and write() down TCP sockets and relies on the kernel to do all
the low-level TCP protocol / wire transmission stuff.

You might want to try increasing your TCP buffer sizes, but I think
that the Linux kernel has some built in limits. Other experts might
want to chime in here...

> 2) When does OpenMPI disable Nagle algorithm? Suppose I DON'T need
> that Nagle has to be ON (focusing only on latency), how can I
> increase performance?

It looks like we enable Nagle right when TCP BTL connections are
made. Surprisingly, it looks like we don't have a run-time option to
turn it off for power-users like you who want to really tweak around.

If you want to play with it, please edit ompi/mca/btl/tcp/
btl_tcp_endpoint.c. You'll see the references to TCP_NODELAY in
conjunction with setsockopt(). Set the optval to 0 instead of 1. A
simple "make install" in that directory will recompile the TCP
component and re-install it (assuming you have done a default build
with OMPI components built as standalone plugins). Let us know what
you find.

Jeff Squyres
Cisco Systems