Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem in Open MPI (v1.8) Performance on 10G Ethernet
From: Muhammad Ansar Javed (muhammad.ansar_at_[hidden])
Date: 2014-04-21 15:32:58


No, I have not tried multi-link.

On Mon, Apr 21, 2014 at 11:50 PM, George Bosilca <bosilca_at_[hidden]>wrote:

> Have you tried the multi-link? Did it helped?
>
> George.
>
>
> On Apr 21, 2014, at 10:34 , Muhammad Ansar Javed <
> muhammad.ansar_at_[hidden]> wrote:
>
> I am able to achieve around 90% ( maximum 9390 Mbps) bandwidth on 10GE.
> There were configuration issues disabling Intel Speedstep and Interrupt
> coalescing helped in achieving expected network bandwidth. Varying send and
> recv buffer sizes from 128 KB to 1 MB added just 50 Mbps with maximum
> bandwidth achieved on 1 MB buffer size.
> Thanks for support.
>
>
> On Thu, Apr 17, 2014 at 6:05 AM, George Bosilca <bosilca_at_[hidden]>wrote:
>
>> Muhammad,
>>
>> Our configuration of TCP is tailored for 1Gbs networks, so it’s
>> performance on 10G might be sub-optimal. That being said, the remaining of
>> this email will be speculation as I do not have access to a 10G system to
>> test it.
>>
>> There are two things that I would test to see if I can improve the
>> performance.
>>
>> 1. The send and receive TCP suffer. These are handled by
>> the btl_tcp_sndbuf and btl_tcp_rcvbuf. By default these are set to 128K
>> which is extremely small for a 10G network. Try 256KB or maybe even 1M (you
>> might need to fiddle with your kernel to get here).
>>
>> 2. Add more links between the processes by increasing the default value
>> for btl_tcp_links to 2 or 4.
>>
>> You might also try to the following (but here I’m more skeptical). Try
>> pushing the value of btl_tcp_endpoint_cache up. This parameter is not to be
>> used eagerly in real applications with a complete communication pattern,
>> but for a benchmark it might be a good use.
>>
>> George.
>>
>> On Apr 16, 2014, at 06:30 , Muhammad Ansar Javed <
>> muhammad.ansar_at_[hidden]> wrote:
>>
>> Hi Ralph,
>> Yes, you are right. I should have also tested NetPipe-MPI version
>> earlier. I ran NetPipe-MPI version on 10G Ethernet and maximum bandwidth
>> achieved is 5872 Mbps. Moreover, maximum bandwidth achieved by osu_bw test
>> is 6080 Mbps. I have used OSU-Micro-Benchmarks version 4.3.
>>
>>
>> On Wed, Apr 16, 2014 at 3:40 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> I apologize, but I am now confused. Let me see if I can translate:
>>>
>>> * you ran the non-MPI version of the NetPipe benchmark and got 9.5Gps on
>>> a 10Gps network
>>>
>>> * you ran iperf and got 9.61Gps - however, this has nothing to do with
>>> MPI. Just tests your TCP stack
>>>
>>> * you tested your bandwidth program on a 1Gps network and got about 90%
>>> efficiency.
>>>
>>> Is the above correct? If so, my actual suggestion was to run the MPI
>>> version of NetPipe and to use the OSB benchmark program as well. Your
>>> program might well be okay, but benchmarking is a hard thing to get right
>>> in a parallel world, so you might as well validate it by cross-checking the
>>> result.
>>>
>>> I suggest this mostly because your performance numbers are far worse
>>> than anything we've measured using those standard benchmarks, and so we
>>> should first ensure we aren't chasing a ghost.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:41 AM, Muhammad Ansar Javed <
>>> muhammad.ansar_at_[hidden]> wrote:
>>>
>>>> Yes, I have tried NetPipe-Java and iperf for bandwidth and
>>>> configuration test. NetPipe Java achieves maximum 9.40 Gbps while iperf
>>>> achieves maximum 9.61 Gbps bandwidth. I have also tested my bandwidth
>>>> program on 1Gbps Ethernet connection and it achieves 901 Mbps bandwidth. I
>>>> am using the same program for 10G network benchmarks. Please find attached
>>>> source file of bandwidth program.
>>>>
>>>> As far as --bind-to core is concerned, I think it is working fine. Here
>>>> is output of --report-bindings switch.
>>>> [host3:07134] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
>>>> [host4:10282] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.]
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 15, 2014 at 8:39 PM, Ralph Castain <rhc_at_[hidden]>wrote:
>>>>
>>>>> Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure
>>>>> the problem isn't in your program? Outside of that, you might want to
>>>>> explicitly tell it to --bind-to core just to be sure it does so - it's
>>>>> supposed to do that by default, but might as well be sure. You can check by
>>>>> adding --report-binding to the cmd line.
>>>>>
>>>>>
>>>>> On Apr 14, 2014, at 11:10 PM, Muhammad Ansar Javed <
>>>>> muhammad.ansar_at_[hidden]> wrote:
>>>>>
>>>>> Hi,
>>>>> I am trying to benchmark Open MPI performance on 10G Ethernet network
>>>>> between two hosts. The performance numbers of benchmarks are less than
>>>>> expected. The maximum bandwidth achieved by OMPI-C is 5678 Mbps and I was
>>>>> expecting around 9000+ Mbps. Moreover latency is also quite higher than
>>>>> expected, ranging from 37 to 59 us. Here is complete set of numbers.
>>>>>
>>>>>
>>>>>
>>>>> *LatencyOpen MPI C Size Time (us)*
>>>>> 1 37.76
>>>>> 2 37.75
>>>>> 4 37.78
>>>>> 8 55.17
>>>>> 16 37.89
>>>>> 32 39.08
>>>>> 64 37.78
>>>>> 128 59.46
>>>>> 256 39.37
>>>>> 512 40.39
>>>>> 1024 47.18
>>>>> 2048 47.84
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *BandwidthOpen MPI C Size (Bytes) Bandwidth (Mbps)*
>>>>> 2048 412.22
>>>>> 4096 539.59
>>>>> 8192 827.73
>>>>> 16384 1655.35
>>>>> 32768 3274.3
>>>>> 65536 1995.22
>>>>> 131072 3270.84
>>>>> 262144 4316.22
>>>>> 524288 5019.46
>>>>> 1048576 5236.17
>>>>> 2097152 5362.61
>>>>> 4194304 5495.2
>>>>> 8388608 5565.32
>>>>> 16777216 5678.32
>>>>>
>>>>>
>>>>> My environments consists of two hosts having point-to-point
>>>>> (switch-less) 10Gbps Ethernet connection. Environment (OS, user, directory
>>>>> structure etc) on both hosts is exactly same. There is no NAS or shared
>>>>> file system between both hosts. Following are configuration and job
>>>>> launching commands that I am using. Moreover, I have attached output of
>>>>> script ompi_info --all.
>>>>>
>>>>> Configuration commmand: ./configure --enable-mpi-java
>>>>> --prefix=/home/mpj/installed/openmpi_installed CC=/usr/bin/gcc
>>>>> --disable-mpi-fortran
>>>>>
>>>>> Job launching command: mpirun -np 2 -hostfile machines -npernode 1
>>>>> ./latency.out
>>>>>
>>>>> Are these numbers okay? If not then please suggest performance tuning
>>>>> steps...
>>>>>
>>>>> Thanks
>>>>>
>>>>> --
>>>>> Ansar Javed
>>>>> HPC Lab
>>>>> SEECS NUST
>>>>> Contact: +92 334 438 9394
>>>>> Email: muhammad.ansar_at_[hidden]
>>>>> <ompi_info.tar.bz2>_______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>>
>>>>
>>>> Ansar Javed
>>>> HPC Lab
>>>> SEECS NUST
>>>> Contact: +92 334 438 9394
>>>> Email: muhammad.ansar_at_[hidden]
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Regards
>>
>> Ansar Javed
>> HPC Lab
>> SEECS NUST
>> Contact: +92 334 438 9394
>> Email: muhammad.ansar_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Regards
>
> Ansar Javed
> HPC Lab
> SEECS NUST
> Contact: +92 334 438 9394
> Email: muhammad.ansar_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Regards
Ansar Javed
HPC Lab
SEECS NUST
Contact: +92 334 438 9394
Email: muhammad.ansar_at_[hidden]