Yevgeny,
The ibstat results:
CA 'mthca0'
       CA type: MT25208 (MT23108 compat mode)
       Number of ports: 2
       Firmware version: 4.7.600
       Hardware version: a0
       Node GUID: 0x0005ad00000c21e0
       System image GUID: 0x0005ad000100d050
       Port 1:
               State: Active
               Physical state: LinkUp
               Rate: 10
               Base lid: 4
               LMC: 0
               SM lid: 2
               Capability mask: 0x02510a68
               Port GUID: 0x0005ad00000c21e1
               Link layer: IB
       Port 2:
               State: Down
               Physical state: Polling
               Rate: 10
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510a68
               Port GUID: 0x0005ad00000c21e2
               Link layer: IB
And more interestingly, ib_write_bw:
                  RDMA_Write BW Test
 Number of qps  : 1
 Connection type : RC
 TX depth       : 300
 CQ Moderation  : 50
 Link type      : IB
 Mtu            : 2048
 Inline data is used up to 0 bytes message
 local address: LID 0x04 QPN 0x1c0407 PSN 0x48ad9e RKey 0xd86a0051 VAddr 0x002ae362870000
 remote address: LID 0x03 QPN 0x2e0407 PSN 0xf57209 RKey 0x8d98003b VAddr 0x002b533d366000
------------------------------------------------------------------
 #bytes    #iterations   BW peak[MB/sec]   BW average[MB/sec]
Conflicting CPU frequency values detected: 1600.000000 != 3301.000000
 65536    5000          0.00              0.00 Â
------------------------------------------------------------------
What does Conflicting CPU frequency values mean?
Examining the /proc/cpuinfo file however shows:
processor      : 0
cpu MHz        : 3301.000
processor      : 1
cpu MHz        : 3301.000
processor      : 2
cpu MHz        : 1600.000
processor      : 3
cpu MHz        : 1600.000
Which seems oddly wierd to me...
________________________________
From: Yevgeny Kliteynik <kliteyn_at_[hidden]>
To: Randolph Pullen <randolph_pullen_at_[hidden]>; OpenMPI Users <users_at_[hidden]>
Sent: Thursday, 6 September 2012 6:03 PM
Subject: Re: [OMPI users] Infiniband performance Problem and stalling
On 9/3/2012 4:14 AM, Randolph Pullen wrote:
> No RoCE, Just native IB with TCP over the top.
Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card".
Could you run "ibstat" and post the results?
What is the expected BW on your cards?
Could you run "ib_write_bw" between two machines?
Also, please see below.
> No I haven't used 1.6 I was trying to stick with the standards on the mellanox disk.
> Is there a known problem with 1.4.3 ?
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------
> *From:* Yevgeny Kliteynik <kliteyn_at_[hidden]>
> *To:* Randolph Pullen <randolph_pullen_at_[hidden]>; Open MPI Users <users_at_[hidden]>
> *Sent:* Sunday, 2 September 2012 10:54 PM
> *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling
>
> Randolph,
>
> Some clarification on the setup:
>
> "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet?
> That is, when you're using openib BTL, you mean RoCE, right?
>
> Also, have you had a chance to try some newer OMPI release?
> Any 1.6.x would do.
>
>
> -- YK
>
> On 8/31/2012 10:53 AM, Randolph Pullen wrote:
>Â > (reposted with consolidatedinformation)
>Â > I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G cards
>Â > running Centos 5.7 Kernel 2.6.18-274
>Â > Open MPI 1.4.3
>Â > MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2):
>Â > On a Cisco 24 pt switch
>Â > Normal performance is:
>Â > $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong
>Â > results in:
>Â > Max rate = 958.388867 MB/sec Min latency = 4.529953 usec
>Â > and:
>Â > $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong
>Â > Max rate = 653.547293 MB/sec Min latency = 19.550323 usec
>Â > NetPipeMPI results show a max of 7.4 Gb/s at 8388605 bytes which seems fine.
>Â > log_num_mtt =20 and log_mtts_per_seg params =2
>Â > My application exchanges about a gig of data between the processes with 2 sender and 2 consumer processes on each node with 1 additional controller process on the starting node.
>Â > The program splits the data into 64K blocks and uses non blocking sends and receives with busy/sleep loops to monitor progress until completion.
>Â > Each process owns a single buffer for these 64K blocks.
>Â > My problem is I see better performance under IPoIB then I do on native IB (RDMA_CM).
>Â > My understanding is that IPoIB is limited to about 1G/s so I am at a loss to know why it is faster.
>Â > These 2 configurations are equivelant (about 8-10 seconds per cycle)
>Â > mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog
>Â > mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog
When you say "--mca btl tcp,self", it means that openib btl is not enabled.
Hence "--mca btl_openib_flags" is irrelevant.
>Â > And this one produces similar run times but seems to degrade with repeated cycles:
>Â > mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl openib,self -H vh2,vh1 -np 9 --bycore prog
You're running 9 ranks on two machines, but you're using IB for intra-node communication.
Is it intentional? If not, you can add "sm" btl and have performance improved.
-- YK
>Â > Other btl_openib_flags settings result in much lower performance.
>Â > Changing the first of the above configs to use openIB results in a 21 second run time at best. Sometimes it takes up to 5 minutes.
>Â > In all cases, OpenIB runs in twice the time it takes TCP,except if I push the small message max to 64K and force short messages. Then the openib times are the same as TCP and no faster.
>Â > With openib:
>Â > - Repeated cycles during a single run seem to slow down with each cycle
>Â > (usually by about 10 seconds).
>Â > - On occasions it seems to stall indefinitely, waiting on a single receive.
>Â > I'm still at a loss as to why. I canât find any errors logged during the runs.
>Â > Any ideas appreciated.
>Â > Thanks in advance,
>Â > Randolph
>Â >
>Â >
>Â > _______________________________________________
>Â > users mailing list
>Â > users_at_[hidden] <mailto:users_at_[hidden]>
>Â > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
|