Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-01-17 04:39:49


Hi Robin,

On Wed, Jan 17, 2007 at 04:12:10AM -0500, Robin Humble wrote:
>
> so this isn't really an OpenMPI questions (I don't think), but you guys
> will have hit the problem if anyone has...
>
> basically I'm seeing wildly different bandwidths over InfiniBand 4x DDR
> when I use different kernels.
> I'm testing with netpipe-3.6.2's NPmpi, but a home-grown pingpong sees
> the same thing.
>
> the default 2.6.9-42.0.3.ELsmp (and also sles10's kernel) gives ok
> bandwidth (50% of peak I guess is good?) at ~10 Gbit/s, but a pile of
> newer kernels (2.16.19.2, 2.6.20-rc4, 2.6.18-1.2732.4.2.el5.OFED_1_1(*))
> all max out at ~5.3 Gbit/s.
>
> half the bandwidth! :-(
> latency is the same.
Try to load ib_mthca with tune_pci=1 option on those kernels that are
slow.

>
> the same OpenMPI (1.1.1 from OSCAR, rebuild for openib support) and
> NPmpi was used with all kernels.
> I see an intermediate bandwidth if one kernel is the 'fast' 2.6.9 and
> another is a 'slow', so they don't appear to be using completely
> different protocols.
> it doesn't make any difference if I try to make extra-sure it's using
> openib with:
> mpirun --mca btl openib --mca btl_tcp_if_exclude lo,eth0 ...
>
> OS is CentOS 4.4 x86_64 which AFAICT includes packages based on OFED 1.0.
> lspci says the PCIe card is:
> InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
> and dmesg says that all kernels are using
> ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
> but also winges that 'HCA FW version 1.0.700 is old'.
>
> any ideas?
> very odd that all new kernels (including for RHEL5) are slow.
>
> will OFED 1.1 make any difference? it didn't build cleanly when I
> tried, but I can try and try again...
>
> thanks for any hints.
>
> cheers,
> robin
>
> (*) rhel5 + OFED 1.1 test kernel, rebuilt for centos4.4 from src.rpm at
> http://people.redhat.com/dledford/Infiniband/kernel/2.6.18/1.2732.4.2.el5.OFED_1_1/x86_64/
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
			Gleb.