Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem in Open MPI (v1.8) Performance on 10G Ethernet
From: Muhammad Ansar Javed (muhammad.ansar_at_[hidden])
Date: 2014-04-21 10:34:39


I am able to achieve around 90% ( maximum 9390 Mbps) bandwidth on 10GE.
There were configuration issues disabling Intel Speedstep and Interrupt
coalescing helped in achieving expected network bandwidth. Varying send and
recv buffer sizes from 128 KB to 1 MB added just 50 Mbps with maximum
bandwidth achieved on 1 MB buffer size.
Thanks for support.

On Thu, Apr 17, 2014 at 6:05 AM, George Bosilca <bosilca_at_[hidden]> wrote:

> Muhammad,
>
> Our configuration of TCP is tailored for 1Gbs networks, so it’s
> performance on 10G might be sub-optimal. That being said, the remaining of
> this email will be speculation as I do not have access to a 10G system to
> test it.
>
> There are two things that I would test to see if I can improve the
> performance.
>
> 1. The send and receive TCP suffer. These are handled by
> the btl_tcp_sndbuf and btl_tcp_rcvbuf. By default these are set to 128K
> which is extremely small for a 10G network. Try 256KB or maybe even 1M (you
> might need to fiddle with your kernel to get here).
>
> 2. Add more links between the processes by increasing the default value
> for btl_tcp_links to 2 or 4.
>
> You might also try to the following (but here I’m more skeptical). Try
> pushing the value of btl_tcp_endpoint_cache up. This parameter is not to be
> used eagerly in real applications with a complete communication pattern,
> but for a benchmark it might be a good use.
>
> George.
>
> On Apr 16, 2014, at 06:30 , Muhammad Ansar Javed <
> muhammad.ansar_at_[hidden]> wrote:
>
> Hi Ralph,
> Yes, you are right. I should have also tested NetPipe-MPI version earlier.
> I ran NetPipe-MPI version on 10G Ethernet and maximum bandwidth achieved is
> 5872 Mbps. Moreover, maximum bandwidth achieved by osu_bw test is 6080
> Mbps. I have used OSU-Micro-Benchmarks version 4.3.
>
>
> On Wed, Apr 16, 2014 at 3:40 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I apologize, but I am now confused. Let me see if I can translate:
>>
>> * you ran the non-MPI version of the NetPipe benchmark and got 9.5Gps on
>> a 10Gps network
>>
>> * you ran iperf and got 9.61Gps - however, this has nothing to do with
>> MPI. Just tests your TCP stack
>>
>> * you tested your bandwidth program on a 1Gps network and got about 90%
>> efficiency.
>>
>> Is the above correct? If so, my actual suggestion was to run the MPI
>> version of NetPipe and to use the OSB benchmark program as well. Your
>> program might well be okay, but benchmarking is a hard thing to get right
>> in a parallel world, so you might as well validate it by cross-checking the
>> result.
>>
>> I suggest this mostly because your performance numbers are far worse than
>> anything we've measured using those standard benchmarks, and so we should
>> first ensure we aren't chasing a ghost.
>>
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 1:41 AM, Muhammad Ansar Javed <
>> muhammad.ansar_at_[hidden]> wrote:
>>
>>> Yes, I have tried NetPipe-Java and iperf for bandwidth and configuration
>>> test. NetPipe Java achieves maximum 9.40 Gbps while iperf achieves maximum
>>> 9.61 Gbps bandwidth. I have also tested my bandwidth program on 1Gbps
>>> Ethernet connection and it achieves 901 Mbps bandwidth. I am using the same
>>> program for 10G network benchmarks. Please find attached source file of
>>> bandwidth program.
>>>
>>> As far as --bind-to core is concerned, I think it is working fine. Here
>>> is output of --report-bindings switch.
>>> [host3:07134] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
>>> [host4:10282] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.]
>>>
>>>
>>>
>>>
>>> On Tue, Apr 15, 2014 at 8:39 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure the
>>>> problem isn't in your program? Outside of that, you might want to
>>>> explicitly tell it to --bind-to core just to be sure it does so - it's
>>>> supposed to do that by default, but might as well be sure. You can check by
>>>> adding --report-binding to the cmd line.
>>>>
>>>>
>>>> On Apr 14, 2014, at 11:10 PM, Muhammad Ansar Javed <
>>>> muhammad.ansar_at_[hidden]> wrote:
>>>>
>>>> Hi,
>>>> I am trying to benchmark Open MPI performance on 10G Ethernet network
>>>> between two hosts. The performance numbers of benchmarks are less than
>>>> expected. The maximum bandwidth achieved by OMPI-C is 5678 Mbps and I was
>>>> expecting around 9000+ Mbps. Moreover latency is also quite higher than
>>>> expected, ranging from 37 to 59 us. Here is complete set of numbers.
>>>>
>>>>
>>>>
>>>> *LatencyOpen MPI C Size Time (us)*
>>>> 1 37.76
>>>> 2 37.75
>>>> 4 37.78
>>>> 8 55.17
>>>> 16 37.89
>>>> 32 39.08
>>>> 64 37.78
>>>> 128 59.46
>>>> 256 39.37
>>>> 512 40.39
>>>> 1024 47.18
>>>> 2048 47.84
>>>>
>>>>
>>>>
>>>>
>>>> *BandwidthOpen MPI C Size (Bytes) Bandwidth (Mbps)*
>>>> 2048 412.22
>>>> 4096 539.59
>>>> 8192 827.73
>>>> 16384 1655.35
>>>> 32768 3274.3
>>>> 65536 1995.22
>>>> 131072 3270.84
>>>> 262144 4316.22
>>>> 524288 5019.46
>>>> 1048576 5236.17
>>>> 2097152 5362.61
>>>> 4194304 5495.2
>>>> 8388608 5565.32
>>>> 16777216 5678.32
>>>>
>>>>
>>>> My environments consists of two hosts having point-to-point
>>>> (switch-less) 10Gbps Ethernet connection. Environment (OS, user, directory
>>>> structure etc) on both hosts is exactly same. There is no NAS or shared
>>>> file system between both hosts. Following are configuration and job
>>>> launching commands that I am using. Moreover, I have attached output of
>>>> script ompi_info --all.
>>>>
>>>> Configuration commmand: ./configure --enable-mpi-java
>>>> --prefix=/home/mpj/installed/openmpi_installed CC=/usr/bin/gcc
>>>> --disable-mpi-fortran
>>>>
>>>> Job launching command: mpirun -np 2 -hostfile machines -npernode 1
>>>> ./latency.out
>>>>
>>>> Are these numbers okay? If not then please suggest performance tuning
>>>> steps...
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> Ansar Javed
>>>> HPC Lab
>>>> SEECS NUST
>>>> Contact: +92 334 438 9394
>>>> Email: muhammad.ansar_at_[hidden]
>>>> <ompi_info.tar.bz2>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>>
>>> Ansar Javed
>>> HPC Lab
>>> SEECS NUST
>>> Contact: +92 334 438 9394
>>> Email: muhammad.ansar_at_[hidden]
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Regards
>
> Ansar Javed
> HPC Lab
> SEECS NUST
> Contact: +92 334 438 9394
> Email: muhammad.ansar_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Regards
Ansar Javed
HPC Lab
SEECS NUST
Contact: +92 334 438 9394
Email: muhammad.ansar_at_[hidden]