Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-01-18 08:10:15


On Thu, Jan 18, 2007 at 07:17:13AM -0500, Robin Humble wrote:
> On Thu, Jan 18, 2007 at 11:08:04AM +0200, Gleb Natapov wrote:
> >On Thu, Jan 18, 2007 at 03:52:19AM -0500, Robin Humble wrote:
> >> On Wed, Jan 17, 2007 at 08:55:31AM -0700, Brian W. Barrett wrote:
> >> >On Jan 17, 2007, at 2:39 AM, Gleb Natapov wrote:
> >> >> On Wed, Jan 17, 2007 at 04:12:10AM -0500, Robin Humble wrote:
> >> >>> basically I'm seeing wildly different bandwidths over InfiniBand 4x DDR
> >> >>> when I use different kernels.
> >> >> Try to load ib_mthca with tune_pci=1 option on those kernels that are
> >> >> slow.
> >> >when an application has high buffer reuse (like NetPIPE), which can
> >> >be enabled by adding "-mca mpi_leave_pinned 1" to the mpirun command
> >> >line.
> >> thanks! :-)
> >> tune_pci=1 makes a huge difference at the top end, and
> >Well this is broken BIOS then. Look here for more explanation:
> >https://staging.openfabrics.org/svn/openib/gen2/branches/1.1/ofed/docs/mthca_release_notes.txt
> >search for "tune_pci=1".
>
> ok. thanks :-/
>
> >> -mca mpi_leave_pinned 1 adds lots of midrange bandwidth.
> >>
> >> latencies (~4us) and the low end performance are all unchanged.
> >>
> >> see attached for details.
> >> most curves are for 2.6.19.2 except the last couple (tagged as old)
> >> which are for 2.6.9-42.0.3.ELsmp and for which tune_pci changes nothing.
> >>
> >> why isn't tune_pci=1 the default I wonder?
> >> files in /sys/module/ib_mthca/ tell me it's off by default in
> >> 2.6.9-42.0.3.ELsmp, but the results imply that it's on... maybe PCIe
> >> handling is very different in that kernel.
> >This is explained in the link above.
>
> hmmm...
> but (sorry to harp on about this) /sys/module/ib_mthca/tune_pci is 0
> for 2.6.9-42.0.3.ELsmp.
> and even if that's lying, then mthca_tune_pci() appears identically
> invoked in mthca_main.c from both 2.6.9-42.0.3.ELsmp and 2.6.19.2.
> mthca_main.c is the only place in infiniband/hw/mthca that
> pci_write_config_word() is called from, so you'd think that's got to be
> how PCIe for IB was setup.
I really don't know details and I don't have sources of older module to
check, but in latest kernel sources tune_pci parameter is checked inside
mthca_tune_pci(). If you want to know more details you can ask openib
mailing list.

>
> basically it's not clear to me how or if tune_pci is being set in
> 2.6.9-42.0.3.ELsmp, nor why it's any different to 2.6.19.2 :-/
>
> maybe it's some other level in the kernel setting up PCIe differently?
> but that would presumably be unrelated to OFED.
BIOS should configure MaxReadReq to maximum value supported by chipset.
Linux shouldn't touch this value at all.

>
> is there a way to check pci burst settings from userland? or BIOS?
You can see PCI settings with lspci. Newest lspci decode this value for
you, with older once you'll have to dump PCI config space to the file
and decode it by yourself.

>
> BTW, the card appears to be Voltaire and system is SGI xe (210 and 240)
> if that helps. /sys/class/infiniband/mthca0/board_id is VLT0050010001
> not that I'm blaming anyone! :-)
The hardware and firmware are produced by Mellanox :)

--
			Gleb.