Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] TCP Bandwidth
From: Andy Georgi (Andy.Georgi_at_[hidden])
Date: 2008-08-19 05:57:08


George Bosilca wrote:

> Btw, can you run the Netpipe benchmark on this configuration please ?
> Once compiled with MPI support and once with TCP. This might give us
> more equitable details (same benchmark).

NPmpi and NPtcp belong to Netpipe but this doesn't mean that they do the same ;-). Anyway, here the
results:

mpirun --hostfile my_hostfile_netpipe -mca btl_tcp_if_include eth2 -np 2 -nolocal NPmpi -l 20971520
-u 209715200
0: chel02
1: chel03
Now starting the main loop
   0: 20971517 bytes 3 times --> 6630.84 Mbps in 24129.67 usec
   1: 20971520 bytes 3 times --> 6631.67 Mbps in 24126.65 usec
   2: 20971523 bytes 3 times --> 6649.08 Mbps in 24063.47 usec
   3: 31457277 bytes 3 times --> 6680.40 Mbps in 35925.98 usec
   4: 31457280 bytes 3 times --> 6667.34 Mbps in 35996.36 usec
   5: 31457283 bytes 3 times --> 6667.53 Mbps in 35995.32 usec
   6: 41943037 bytes 3 times --> 6708.97 Mbps in 47697.35 usec
   7: 41943040 bytes 3 times --> 6700.24 Mbps in 47759.49 usec
   8: 41943043 bytes 3 times --> 6698.70 Mbps in 47770.50 usec
   9: 62914557 bytes 3 times --> 6724.57 Mbps in 71380.02 usec
  10: 62914560 bytes 3 times --> 6726.09 Mbps in 71363.85 usec
  11: 62914563 bytes 3 times --> 6728.26 Mbps in 71340.84 usec
  12: 83886077 bytes 3 times --> 6736.77 Mbps in 95000.98 usec
  13: 83886080 bytes 3 times --> 6741.62 Mbps in 94932.68 usec
  14: 83886083 bytes 3 times --> 6743.01 Mbps in 94913.16 usec
  15: 125829117 bytes 3 times --> 6765.21 Mbps in 141902.49 usec
  16: 125829120 bytes 3 times --> 6764.36 Mbps in 141920.33 usec
  17: 125829123 bytes 3 times --> 6765.18 Mbps in 141903.00 usec
  18: 167772157 bytes 3 times --> 6767.28 Mbps in 189145.53 usec
  19: 167772160 bytes 3 times --> 6775.90 Mbps in 188904.84 usec
  20: 167772163 bytes 3 times --> 6768.77 Mbps in 189103.80 usec

NPtcp -h 192.168.10.2 -l 20971520 -u 209715200
Send and receive buffers are 107374182 and 107374182 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
   0: 20971517 bytes 3 times --> 8565.70 Mbps in 18679.14 usec
   1: 20971520 bytes 3 times --> 8561.11 Mbps in 18689.16 usec
   2: 20971523 bytes 3 times --> 8570.28 Mbps in 18669.17 usec
   3: 31457277 bytes 3 times --> 8554.48 Mbps in 28055.47 usec
   4: 31457280 bytes 3 times --> 8556.30 Mbps in 28049.51 usec
   5: 31457283 bytes 3 times --> 8566.58 Mbps in 28015.85 usec
   6: 41943037 bytes 3 times --> 8560.95 Mbps in 37379.03 usec
   7: 41943040 bytes 3 times --> 8554.36 Mbps in 37407.84 usec
   8: 41943043 bytes 3 times --> 8558.02 Mbps in 37391.82 usec
   9: 62914557 bytes 3 times --> 8546.83 Mbps in 56161.17 usec
  10: 62914560 bytes 3 times --> 8549.41 Mbps in 56144.20 usec
  11: 62914563 bytes 3 times --> 8561.18 Mbps in 56067.03 usec
  12: 83886077 bytes 3 times --> 8565.96 Mbps in 74714.34 usec
  13: 83886080 bytes 3 times --> 8563.17 Mbps in 74738.66 usec
  14: 83886083 bytes 3 times --> 8549.71 Mbps in 74856.32 usec
  15: 125829117 bytes 3 times --> 8580.90 Mbps in 111876.33 usec
  16: 125829120 bytes 3 times --> 8574.20 Mbps in 111963.83 usec
  17: 125829123 bytes 3 times --> 8572.41 Mbps in 111987.19 usec
  18: 167772157 bytes 3 times --> 8601.10 Mbps in 148818.17 usec
  19: 167772160 bytes 3 times --> 8602.33 Mbps in 148796.84 usec
  20: 167772163 bytes 3 times --> 8595.99 Mbps in 148906.67 usec

Let us be optimistic and say ~1075 MB/s with NPtcp and ~850 MB/s with NPmpi. Vaguely the same
differences but lower values.

George Bosilca wrote:

> There are two parameters hat can slightly improve the
> behavior: btl_tcp_rdma_pipeline_send_length and
> btl_tcp_min_rdma_pipeline_size.

These two parameters doesn't exist. Here the result of ompi_info --param btl tcp:

MCA btl: parameter "btl_base_debug" (current value: "0")
        If btl_base_debug is 1 standard debug is output, if > 1 verbose debug is output
MCA btl: parameter "btl" (current value: <none>)
        Default selection set of components for the btl framework (<none> means "use all components that
can be found")
MCA btl: parameter "btl_base_verbose" (current value: "0")
        Verbosity level for the btl framework (0 = no verbosity)
MCA btl: parameter "btl_tcp_if_include" (current value: <none>)
MCA btl: parameter "btl_tcp_if_exclude" (current value: "lo")
MCA btl: parameter "btl_tcp_free_list_num" (current value: "8")
MCA btl: parameter "btl_tcp_free_list_max" (current value: "-1")
MCA btl: parameter "btl_tcp_free_list_inc" (current value: "32")
MCA btl: parameter "btl_tcp_sndbuf" (current value: "393216")
MCA btl: parameter "btl_tcp_rcvbuf" (current value: "393216")
MCA btl: parameter "btl_tcp_endpoint_cache" (current value: "30720")
MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536")
MCA btl: parameter "btl_tcp_min_send_size" (current value: "65536")
MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072")
MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072")
MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")
MCA btl: parameter "btl_tcp_flags" (current value: "122")
MCA btl: parameter "btl_tcp_priority" (current value: "0")
MCA btl: parameter "btl_base_warn_component_unused" (current value: "1")
        This parameter is used to turn on warning messages when certain NICs are not used

The buffer sizes are already "tuned". Btw, what is optimal for the send values
("btl_tcp_min_send_size" and "btl_tcp_max_send_size")? A high value (just few segmentations on MPI
level) or a low value? We changed it both ways, but it had nearly no effects.

Jon Mason wrote:

> That script should optimally tune your NIC. If you are still not
> satisfied with the performance, Chelsio should have people available to
> help. Since the TOE module is not opensource, there is not much anyone
> else can do. You can try tweaking any module parms that are exposed.
> Checkout the modinfo output for that module.

The NIC is well tuned, as the results on TCP-level show. There seems to be a bottleneck in Open MPI.

Jon Mason wrote:

> You can also try the new iWARP support in OMPI 1.3. The perf for that
> should be much better.

Yes i will try it, but i can't offer a unstable version of Open MPI in a productive system. So still
it's not released official the users have to work with 1.2.6

Steve Wise wrote:

> So OMPI experts, what is the overhead you see on other TCP links for
> OMPI BW tests vs native sockets TCP BW tests?

This is exactly what i need to know :)

Thanks a lot for the interest and the hints by now

Andy

-- 
Dresden University of Technology
Center for Information Services
and High Performance Computing (ZIH)
D-01062 Dresden
Germany
Phone:    (+49) 351/463-38783
Fax:      (+49) 351/463-38245
e-mail: Andy.Georgi_at_[hidden]
WWW:    http://www.tu-dresden.de/zih