Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Infiniband performance Problem and stalling
From: Randolph Pullen (randolph_pullen_at_[hidden])
Date: 2012-09-07 00:43:24


Yevgeny, The ibstat results: CA 'mthca0'         CA type: MT25208 (MT23108 compat mode)         Number of ports: 2         Firmware version: 4.7.600         Hardware version: a0         Node GUID: 0x0005ad00000c21e0         System image GUID: 0x0005ad000100d050         Port 1:                 State: Active                 Physical state: LinkUp                 Rate: 10                 Base lid: 4                 LMC: 0                 SM lid: 2                 Capability mask: 0x02510a68                 Port GUID: 0x0005ad00000c21e1                 Link layer: IB         Port 2:                 State: Down                 Physical state: Polling                 Rate: 10                 Base lid: 0                 LMC: 0                 SM lid: 0                 Capability mask: 0x02510a68                 Port GUID: 0x0005ad00000c21e2                 Link layer: IB And more interestingly, ib_write_bw:                    RDMA_Write BW Test  Number of qps   : 1  Connection type : RC  TX depth        : 300  CQ Moderation   : 50  Link type       : IB  Mtu             : 2048  Inline data is used up to 0 bytes message  local address: LID 0x04 QPN 0x1c0407 PSN 0x48ad9e RKey 0xd86a0051 VAddr 0x002ae362870000  remote address: LID 0x03 QPN 0x2e0407 PSN 0xf57209 RKey 0x8d98003b VAddr 0x002b533d366000 ------------------------------------------------------------------  #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec] Conflicting CPU frequency values detected: 1600.000000 != 3301.000000  65536     5000           0.00               0.00   ------------------------------------------------------------------ What does Conflicting CPU frequency values mean? Examining the /proc/cpuinfo file however shows: processor       : 0 cpu MHz         : 3301.000 processor       : 1 cpu MHz         : 3301.000 processor       : 2 cpu MHz         : 1600.000 processor       : 3 cpu MHz         : 1600.000 Which seems oddly wierd to me... ________________________________ From: Yevgeny Kliteynik <kliteyn_at_[hidden]> To: Randolph Pullen <randolph_pullen_at_[hidden]>; OpenMPI Users <users_at_[hidden]> Sent: Thursday, 6 September 2012 6:03 PM Subject: Re: [OMPI users] Infiniband performance Problem and stalling On 9/3/2012 4:14 AM, Randolph Pullen wrote: > No RoCE, Just native IB with TCP over the top. Sorry, I'm confused - still not clear what is "Melanox III HCA 10G card". Could you run "ibstat" and post the results? What is the expected BW on your cards? Could you run "ib_write_bw" between two machines? Also, please see below. > No I haven't used 1.6 I was trying to stick with the standards on the mellanox disk. > Is there a known problem with 1.4.3 ? > >
 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------ > *From:* Yevgeny Kliteynik <kliteyn_at_[hidden]> > *To:* Randolph Pullen <randolph_pullen_at_[hidden]>; Open MPI Users <users_at_[hidden]> > *Sent:* Sunday, 2 September 2012 10:54 PM > *Subject:* Re: [OMPI users] Infiniband performance Problem and stalling > > Randolph, > > Some clarification on the setup: > > "Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet? > That is, when you're using openib BTL, you mean RoCE, right? > > Also, have you had a chance to try some newer OMPI release? > Any 1.6.x would do. > > > -- YK > > On 8/31/2012 10:53 AM, Randolph Pullen wrote: >  > (reposted with consolidatedinformation) >  > I have a test rig comprising 2 i7 systems 8GB RAM with Melanox III HCA 10G cards >  > running Centos 5.7 Kernel 2.6.18-274 >  > Open MPI 1.4.3 >  > MLNX_OFED_LINUX-1.5.3-1.0.0.2 (OFED-1.5.3-1.0.0.2): >  > On a Cisco 24 pt switch >  > Normal performance is: >  > $ mpirun --mca btl openib,self -n 2 -hostfile mpi.hosts PingPong >  > results in: >  > Max rate = 958.388867 MB/sec Min latency = 4.529953 usec >  > and: >  > $ mpirun --mca btl tcp,self -n 2 -hostfile mpi.hosts PingPong >  > Max rate = 653.547293 MB/sec Min latency = 19.550323 usec >  > NetPipeMPI results show a max of 7.4 Gb/s at 8388605 bytes which seems fine. >  > log_num_mtt =20 and log_mtts_per_seg params =2 >  > My application exchanges about a gig of data between the processes with 2 sender and 2 consumer processes on each node with 1 additional controller process on the starting node. >  > The program splits the data into 64K blocks and uses non blocking sends and receives with busy/sleep loops to monitor progress until completion. >  > Each process owns a single buffer for these 64K blocks. >  > My problem is I see better performance under IPoIB then I do on native IB (RDMA_CM). >  > My understanding is that IPoIB is limited to about 1G/s so I am at a loss to know why it is faster. >  > These 2 configurations are equivelant (about 8-10 seconds per cycle) >  > mpirun --mca btl_openib_flags 2 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog >  > mpirun --mca btl_openib_flags 3 --mca mpi_leave_pinned 1 --mca btl tcp,self -H vh2,vh1 -np 9 --bycore prog When you say "--mca btl tcp,self", it means that openib btl is not enabled. Hence "--mca btl_openib_flags" is irrelevant. >  > And this one produces similar run times but seems to degrade with repeated cycles: >  > mpirun --mca btl_openib_eager_limit 64 --mca mpi_leave_pinned 1 --mca btl openib,self -H vh2,vh1 -np 9 --bycore prog You're running 9 ranks on two machines, but you're using IB for intra-node communication. Is it intentional? If not, you can add "sm" btl and have performance improved. -- YK >  > Other btl_openib_flags settings result in much lower performance. >  > Changing the first of the above configs to use openIB results in a 21 second run time at best. Sometimes it takes up to 5 minutes. >  > In all cases, OpenIB runs in twice the time it takes TCP,except if I push the small message max to 64K and force short messages. Then the openib times are the same as TCP and no faster. >  > With openib: >  > - Repeated cycles during a single run seem to slow down with each cycle >  > (usually by about 10 seconds). >  > - On occasions it seems to stall indefinitely, waiting on a single receive. >  > I'm still at a loss as to why. I can’t find any errors logged during the runs. >  > Any ideas appreciated. >  > Thanks in advance, >  > Randolph >  > >  > >  > _______________________________________________ >  > users mailing list >  > users_at_[hidden] <mailto:users_at_[hidden]> >  > http://www.open-mpi.org/mailman/listinfo.cgi/users > > >