Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] btl_tcp_use_nagle is negated in openmpi-1.7.4rc1
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-01-08 10:15:40


You are quite correct - r29719 did indeed reverse the logic of that param. Thanks for tracking it down!

I pushed a fix to the trunk and scheduled it for 1.7.4

On Jan 7, 2014, at 9:45 PM, tmishima_at_[hidden] wrote:

>
>
> Hi,
>
> I found that btl_tcp_use_nagle was negated in openmpi-1.7.4rc1, which
> causes severe slowdown of tcp-network for smaller size(< 1024) in our
> environment as show at the bottom.
>
> This happened in SVN r28719, where new MCA variable system was added.
> The flag of tcp_not_use_nodelay was newly introduced as the negation of
> tcp_use_nodelay in r28719(btl_tcp_component.c).
>
> r28361(btl_tcp_component.c):
> 218 mca_btl_tcp_component.tcp_use_nodelay =
> 219 !mca_btl_tcp_param_register_int ("use_nagle", "Whether to
> use Nagle's algorithm or not (using Nagle's algo
> rithm may increase short message latency)", 0);
>
> r28719(btl_tcp_component.c):
> 242 mca_btl_tcp_param_register_int ("use_nagle", "Whether to use
> Nagle's algorithm or not (using Nagle's algorithm
> may increase short message latency)", 0,
> &mca_btl_tcp_component.tcp_not_use_nodelay);
>
> In spite of this negation, the socket option was set by tcp_not_use_nodelay
> as same as before in btl_tcp_endpoint.c. I think the line 515 should be:
>
> optval = !mca_btl_tcp_component.tcp_not_use_nodelay; /* tmishima */
>
> I already confirmed that this fix worked well with openmpi-1.7.4rc1.
>
> btl_tcp_endpoint.c @ 28719 :
> 514 #if defined(TCP_NODELAY)
> 515 optval = mca_btl_tcp_component.tcp_not_use_nodelay;
> 516 if(setsockopt(sd, IPPROTO_TCP, TCP_NODELAY, (char *)&optval,
> sizeof(optval)) < 0) {
> 517 BTL_ERROR(("setsockopt(TCP_NODELAY) failed: %s (%d)",
> 518 strerror(opal_socket_errno), opal_socket_errno));
> 519 }
> 520 #endif
>
> Regards,
> Tetsuya Mishima
>
> [mishima_at_manage OMB-3.1.1]$ mpirun -np 2 -host manage,node05 -mca btl
> self,tcp osu_bw
> # OSU MPI Bandwidth Test v3.1.1
> # Size Bandwidth (MB/s)
> 1 0.00
> 2 0.01
> 4 0.01
> 8 0.03
> 16 0.05
> 32 0.10
> 64 0.16
> 128 0.35
> 256 0.74
> 512 20.30
> 1024 149.89
> 2048 182.88
> 4096 203.17
> 8192 217.08
> 16384 228.58
> 32768 232.21
> 65536 169.81
> 131072 232.67
> 262144 207.03
> 524288 224.22
> 1048576 233.30
> 2097152 233.51
> 4194304 234.64
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users