Muhammad,

Our configuration of TCP is tailored for 1Gbs networks, so itís performance on 10G might be sub-optimal. That being said, the remaining of this email will be speculation as I do not have access to a 10G system to test it.

There are two things that I would test to see if I can improve the performance.

1. The send and receive TCP suffer. These are handled by the btl_tcp_sndbuf and btl_tcp_rcvbuf. By default these are set to 128K which is extremely small for a 10G network. Try 256KB or maybe even 1M (you might need to fiddle with your kernel to get here).

2. Add more links between the processes by increasing the default value for btl_tcp_links to 2 or 4.

You might also try to the following (but here Iím more skeptical). Try pushing the value of btl_tcp_endpoint_cache up. This parameter is not to be used eagerly in real applications with a complete communication pattern, but for a benchmark it might be a good use.

  George.

On Apr 16, 2014, at 06:30 , Muhammad Ansar Javed <muhammad.ansar@seecs.edu.pk> wrote:

Hi Ralph,
Yes, you are right. I should have also tested NetPipe-MPI version earlier. I ran NetPipe-MPI version on 10G Ethernet and maximum bandwidth achieved is 5872 Mbps. Moreover, maximum bandwidth achieved by osu_bw test is 6080 Mbps. I have used OSU-Micro-Benchmarks version 4.3.


On Wed, Apr 16, 2014 at 3:40 PM, Ralph Castain <rhc@open-mpi.org> wrote:
I apologize, but I am now confused. Let me see if I can translate:

* you ran the non-MPI version of the NetPipe benchmark and got 9.5Gps on a 10Gps network

* you ran iperf and got 9.61Gps - however, this has nothing to do with MPI. Just tests your TCP stack

* you tested your bandwidth program on a 1Gps network and got about 90% efficiency.

Is the above correct? If so, my actual suggestion was to run the MPI version of NetPipe and to use the OSB benchmark program as well. Your program might well be okay, but benchmarking is a hard thing to get right in a parallel world, so you might as well validate it by cross-checking the result.

I suggest this mostly because your performance numbers are far worse than anything we've measured using those standard benchmarks, and so we should first ensure we aren't chasing a ghost.





On Wed, Apr 16, 2014 at 1:41 AM, Muhammad Ansar Javed <muhammad.ansar@seecs.edu.pk> wrote:
Yes, I have tried NetPipe-Java and iperf for bandwidth and configuration test. NetPipe Java achieves maximum 9.40 Gbps while iperf achieves maximum 9.61 Gbps bandwidth. I have also tested my bandwidth program on 1Gbps Ethernet connection and it achieves 901 Mbps bandwidth. I am using the same program for 10G network benchmarks. Please find attached source file of bandwidth program.

As far as --bind-to core is concerned, I think it is working fine. Here is output of --report-bindings switch.
[host3:07134] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
[host4:10282] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.]




On Tue, Apr 15, 2014 at 8:39 PM, Ralph Castain <rhc@open-mpi.org> wrote:
Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure the problem isn't in your program? Outside of that, you might want to explicitly tell it to --bind-to core just to be sure it does so - it's supposed to do that by default, but might as well be sure. You can check by adding --report-binding to the cmd line.


On Apr 14, 2014, at 11:10 PM, Muhammad Ansar Javed <muhammad.ansar@seecs.edu.pk> wrote:

Hi,
I am trying to benchmark Open MPI performance on 10G Ethernet network between two hosts. The performance numbers of benchmarks are less than expected. The maximum bandwidth achieved by OMPI-C is 5678 Mbps and I was expecting around 9000+ Mbps. Moreover latency is also quite higher than expected, ranging from 37 to 59 us. Here is complete set of numbers.

Latency
Open MPI C   
Size    Time (us)

1         37.76
2         37.75
4         37.78
8         55.17
16       37.89
32       39.08
64       37.78
128     59.46
256     39.37
512     40.39
1024   47.18
2048   47.84
   

Bandwidth
Open MPI C   
Size (Bytes)    Bandwidth (Mbps)

2048               412.22
4096               539.59
8192               827.73
16384             1655.35
32768             3274.3
65536             1995.22
131072           3270.84
262144           4316.22
524288           5019.46
1048576         5236.17
2097152         5362.61
4194304         5495.2
8388608         5565.32
16777216       5678.32


My environments consists of two hosts having point-to-point (switch-less) 10Gbps Ethernet connection.  Environment (OS, user, directory structure etc) on both hosts is exactly same. There is no NAS or shared file system between both hosts. Following are configuration and job launching commands that I am using. Moreover, I have attached output of script ompi_info --all.

Configuration commmand: ./configure --enable-mpi-java --prefix=/home/mpj/installed/openmpi_installed CC=/usr/bin/gcc --disable-mpi-fortran

Job launching command: mpirun -np 2 -hostfile machines -npernode 1 ./latency.out

Are these numbers okay? If not then please suggest performance tuning steps...

Thanks

--
Ansar Javed
HPC Lab
SEECS NUST
Contact: +92 334 438 9394
Email: muhammad.ansar@seecs.edu.pk
<ompi_info.tar.bz2>_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Regards


Ansar Javed
HPC Lab
SEECS NUST
Contact: +92 334 438 9394
Email: muhammad.ansar@seecs.edu.pk

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Regards

Ansar Javed
HPC Lab
SEECS NUST
Contact: +92 334 438 9394
Email: muhammad.ansar@seecs.edu.pk
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users