Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] doubt on latency result with OpenMPI library
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-03-27 06:04:48


Try adding "--map-by node" to your command line to ensure the procs really
are running on separate nodes.

On Thu, Mar 27, 2014 at 1:40 AM, Wang,Yanfei(SYS) <wangyanfei01_at_[hidden]>wrote:

> Hi,
>
>
>
> HW Test Topology:
>
> Ip:192.168.72.4/24 –192.168.72.4/24, enable vlan and RoCE
>
> IB03 server 40G port-- - 40G Ethernet switch ----IB04 server 40G port: configure
> it as RoCE link
>
> IP: 192.168.71.3/24 ---192.168.71.4/24
>
> IB03 server 10G port – 10G Ethernet switch – IB04 server 10G port: configure
> it as normal TCP/IP Ethernet link:(server management interface)
>
>
>
> Mpi configuration:
>
> *MPI Hosts file**:*
>
> [root_at_bb-nsi-ib04 pt2pt]# cat hosts
>
> ib03 slots=1
>
> ib04 slots=1
>
> *DNS hosts*
>
> [root_at_bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.71.3 ib03
>
> 192.168.71.4 ib04
>
> [root_at_bb-nsi-ib04 pt2pt]#
>
> This configuration will create 2 nodes for MPI latency evaluation
>
>
>
> Benchmark:
>
> osu-micro-benchmarks-4.3
>
>
>
> result:
>
> a. Enable traffic go between 10G TCP/IP port using following
> /etc/hosts file
>
>
>
> root_at_bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.71.3 ib03
>
> 192.168.71.4 ib04
>
> The average latency is 4.5us of osu_latency, see log following:
>
> [root_at_bb-nsi-ib04 pt2pt]# mpirun --hostfile hosts -np 2 osu_latency
>
> # OSU MPI Latency Test v4.3
>
> # Size Latency (us)
>
> 0 4.56
>
> 1 4.90
>
> 2 4.90
>
> 4 4.60
>
> 8 4.71
>
> 16 4.72
>
> 32 5.40
>
> 64 4.77
>
> 128 6.74
>
> 256 7.01
>
> 512 7.14
>
> 1024 7.63
>
> 2048 8.22
>
> 4096 10.39
>
> 8192 14.26
>
> 16384 20.80
>
> 32768 31.97
>
> 65536 37.75
>
> 131072 47.28
>
> 262144 80.40
>
> 524288 137.65
>
> 1048576 250.17
>
> 2097152 484.71
>
> 4194304 946.01
>
>
>
> b. Enable traffic go between RoCE link using /etc/hosts as
> following and mpirun –mca btl openib,self,sm …
>
> [root_at_bb-nsi-ib04 pt2pt]# cat /etc/hosts
>
> 192.168.72.3 ib03
>
> 192.168.72.4 ib04
>
> Result:
>
> [root_at_bb-nsi-ib04 pt2pt]# mpirun --hostfile hosts -np 2 --mca btl
> openib,self,sm --mca btl_openib_cpc_include rdmacm osu_latency
>
> # OSU MPI Latency Test v4.3
>
> # Size Latency (us)
>
> 0 4.83
>
> 1 5.17
>
> 2 5.12
>
> 4 5.25
>
> 8 5.38
>
> 16 5.40
>
> 32 5.19
>
> 64 5.04
>
> 128 6.74
>
> 256 7.04
>
> 512 7.34
>
> 1024 7.91
>
> 2048 8.17
>
> 4096 10.39
>
> 8192 14.22
>
> 16384 22.05
>
> 32768 31.68
>
> 65536 37.57
>
> 131072 48.25
>
> 262144 79.98
>
> 524288 137.66
>
> 1048576 251.38
>
> 2097152 485.66
>
> 4194304 947.81
>
> [root_at_bb-nsi-ib04 pt2pt]#
>
>
>
> *Question: *
>
> *1. **Why do they have similar latency, 5us, which is too small to
> believe it! In our test environment, it will take more than 50 us to deal
> with tcp sync and return sync_ack, and also x86 server will take more thans
> 20us at average to do ip forwarding(test from professional HW tester), so
> does the latency is reasonable?*
>
> *2. **Normally, the switch will introduces more than 1.5us switch
> time! Using accelio, a mellanox released opensource rdma library, it will
> take at least 4 us rtt latency to do simpe ping-pong test. So 5 us MPI
> latency (TCP/IP and RoCE) above is rather unbelievable… *
>
> *3. **The fact that the tcp/ip transport and roce RDMA transport
> acquire same latency is so puzzling.. *
>
>
>
>
>
> *Before deeply understanding what happened inside the MPI benchmark, can
> show us some suggestion? Does the mpirun command works correctly here? *
>
> *It must has some mistakes about this test, pls correct me,. *
>
>
>
> *Eg: tcp syn&sync ack latency:*
>
>
>
> *Thanks *
>
> *-Yanfei*
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/03/14400.php
>




image001.png