Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-25 07:54:33


On Jul 25, 2007, at 7:45 AM, Biagio Cosenza wrote:

> Jeff, I did what you suggested
>
> However no noticeable changes seem to happen. Same peaks and same
> latency times.

Ok. This suggests that Nagle may not be the issue here. Is the code
tightly coupled? If so, this could be normal operating system
"jitter" -- one MPI process was swapped out to run some system daemon
and therefore other MPI processes saw a blocking effect until the
peer returned, causing performance ripples.

> Are you sure that for disabling the Nagle's algorithm is needed
> just changing optval to 0?
> I saw that, in btl_tcp_endpoint.c, the optval assignement is inside a
> #if defined(TCP_NODELAY) block.
>
> Where does this macro can be defined?

It's usually within system header files. A trivial check can be used
to figure out if your system is compiling this block: put a syntax
error within the #if block and then rebuild the TCP component. If
the compile fails due to the syntax error, then you know that that
block is being compiled.

> Any other idea for manage latency peaks?
>
> Biagio
>
>
> On 7/24/07, Jeff Squyres <jsquyres_at_[hidden]> wrote: On Jul 23,
> 2007, at 6:43 AM, Biagio Cosenza wrote:
>
> > I'm working on a parallel real time renderer: an embarassing
> > parallel problem where latency is the threshold to high perfomance.
> >
> > Two observations:
> >
> > 1) I did a simple "ping-pong" test (the master does a Bcast + an
> > IRecv for each node + a Waitall) similar to effective renderer
> > workload. Using a cluster of 37 nodes on Gigabit Ethernet, seems
> > that the latency is usually low (about 1-5 ms), but sometimes there
> > are some peaks of about 200 ms. I thought that the cause is a
> > packet retransmission in one of the 37 connections, that blow the
> > overall performance of the test (of course, the final WaitAll is a
> > synch).
> >
> > 2) A research team argues in a paper that MPI suffers on
> > dynamically manage latency. They also arguing an interesting
> > problem about enable/disable Nagle algorithm. (I paste the
> > interesting paragraph below)
> >
> >
> > So I have two questions:
> >
> > 1) Why my test have these peaks? How can I afford them (I think to
> > btl tcp params)?
>
> They are probably beyond Open MPI's control -- OMPI mainly does read
> () and write() down TCP sockets and relies on the kernel to do all
> the low-level TCP protocol / wire transmission stuff.
>
> You might want to try increasing your TCP buffer sizes, but I think
> that the Linux kernel has some built in limits. Other experts might
> want to chime in here...
>
> > 2) When does OpenMPI disable Nagle algorithm? Suppose I DON'T need
> > that Nagle has to be ON (focusing only on latency), how can I
> > increase performance?
>
> It looks like we enable Nagle right when TCP BTL connections are
> made. Surprisingly, it looks like we don't have a run-time option to
> turn it off for power-users like you who want to really tweak around.
>
> If you want to play with it, please edit ompi/mca/btl/tcp/
> btl_tcp_endpoint.c. You'll see the references to TCP_NODELAY in
> conjunction with setsockopt(). Set the optval to 0 instead of 1. A
> simple "make install" in that directory will recompile the TCP
> component and re-install it (assuming you have done a default build
> with OMPI components built as standalone plugins). Let us know what
> you find.
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems