Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question on handling of memory for communications
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-07-08 10:57:33

On Jul 6, 2013, at 4:59 PM, Michael Thomadakis <drmichaelt7777_at_[hidden]> wrote:

> When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 gen 3 do you pay any special attention to the memory buffers according to which socket/memory controller their physical memory belongs to?
> For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do you do anything special when the read/write buffers map to physical memory belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory that belongs (is accessible via) the other socket?

It is not *necessary* to do ensure that buffers are NUMA-local to the PCI device that they are writing to, but it certainly results in lower latency to read/write to PCI devices (regardless of flavor) that are attached to an MPI process' local NUMA node. The Hardware Locality (hwloc) tool "lstopo" can print a pretty picture of your server to show you where your PCI busses are connected.

For TCP, Open MPI will use all TCP devices that it finds by default (because it is assumed that latency is so high that NUMA locality doesn't matter). The openib (OpenFabrics) transport will use the "closest" HCA ports that it can find to each MPI process.

In our upcoming Cisco ultra low latency BTL, it defaults to using the closest Cisco VIC ports that it can find for short messages (i.e., to minimize latency), but uses all available VICs for long messages (i.e., to maximize bandwidth).

> Has this situation improved with Ivy-Brige systems or Haswell?

It's the same overall architecture (i.e., NUMA).

Jeff Squyres
For corporate legal information go to: