Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-25 07:54:33

On Jul 25, 2007, at 7:45 AM, Biagio Cosenza wrote:

> Jeff, I did what you suggested
> However no noticeable changes seem to happen. Same peaks and same
> latency times.

Ok. This suggests that Nagle may not be the issue here. Is the code
tightly coupled? If so, this could be normal operating system
"jitter" -- one MPI process was swapped out to run some system daemon
and therefore other MPI processes saw a blocking effect until the
peer returned, causing performance ripples.

> Are you sure that for disabling the Nagle's algorithm is needed
> just changing optval to 0?
> I saw that, in btl_tcp_endpoint.c, the optval assignement is inside a
> #if defined(TCP_NODELAY) block.
> Where does this macro can be defined?

It's usually within system header files. A trivial check can be used
to figure out if your system is compiling this block: put a syntax
error within the #if block and then rebuild the TCP component. If
the compile fails due to the syntax error, then you know that that
block is being compiled.

> Any other idea for manage latency peaks?
> Biagio
> On 7/24/07, Jeff Squyres <jsquyres_at_[hidden]> wrote: On Jul 23,
> 2007, at 6:43 AM, Biagio Cosenza wrote:
> > I'm working on a parallel real time renderer: an embarassing
> > parallel problem where latency is the threshold to high perfomance.
> >
> > Two observations:
> >
> > 1) I did a simple "ping-pong" test (the master does a Bcast + an
> > IRecv for each node + a Waitall) similar to effective renderer
> > workload. Using a cluster of 37 nodes on Gigabit Ethernet, seems
> > that the latency is usually low (about 1-5 ms), but sometimes there
> > are some peaks of about 200 ms. I thought that the cause is a
> > packet retransmission in one of the 37 connections, that blow the
> > overall performance of the test (of course, the final WaitAll is a
> > synch).
> >
> > 2) A research team argues in a paper that MPI suffers on
> > dynamically manage latency. They also arguing an interesting
> > problem about enable/disable Nagle algorithm. (I paste the
> > interesting paragraph below)
> >
> >
> > So I have two questions:
> >
> > 1) Why my test have these peaks? How can I afford them (I think to
> > btl tcp params)?
> They are probably beyond Open MPI's control -- OMPI mainly does read
> () and write() down TCP sockets and relies on the kernel to do all
> the low-level TCP protocol / wire transmission stuff.
> You might want to try increasing your TCP buffer sizes, but I think
> that the Linux kernel has some built in limits. Other experts might
> want to chime in here...
> > 2) When does OpenMPI disable Nagle algorithm? Suppose I DON'T need
> > that Nagle has to be ON (focusing only on latency), how can I
> > increase performance?
> It looks like we enable Nagle right when TCP BTL connections are
> made. Surprisingly, it looks like we don't have a run-time option to
> turn it off for power-users like you who want to really tweak around.
> If you want to play with it, please edit ompi/mca/btl/tcp/
> btl_tcp_endpoint.c. You'll see the references to TCP_NODELAY in
> conjunction with setsockopt(). Set the optval to 0 instead of 1. A
> simple "make install" in that directory will recompile the TCP
> component and re-install it (assuming you have done a default build
> with OMPI components built as standalone plugins). Let us know what
> you find.
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> users mailing list
> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems