Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brock Palen (brockp_at_[hidden])
Date: 2006-12-02 16:37:30

On Dec 2, 2006, at 10:31 AM, Jeff Squyres wrote:

> FWIW, especially on NUMA machines (like AMDs), physical access to
> network resources (such as NICs / HCAs) can be much faster on
> specific sockets.
> For example, we recently ran some microbenchmarks showing that if you
> run 2 MPI processes across 2 NUMA machines (e.g., a simple ping-pong
> benchmark across 2 machines), if you pin the MPI process to socket 0/
> core 0, you'll get noticeably better latency. If you don't, the MPI
> process may not be consistently located physically close to the NIC/
> HCA, resulting in more "jitter" in the delivered latency (or even
> worse, consistently worse latency).
> I *believe* that this has to do with physical setup within the
> machine (i.e., the NIC/HCA bus is physically "closer" to some
> sockets), but I'm not much of a hardware guy to know that for sure.
> Someone with more specific knowledge should chime in here...
This is true, It is because only a single cpu has a HT connection to
the chipset which then connects to all other devices (NIC, USB,
HD's). All other cpus must send data down its connection to the
cpu with the connection to the chipset. I think though (not sure on
duel core) that all cpus up to 8 way, have connections to all other
cpus. So while a single cpu would have lower latency, all others
should have roughly the same latency.

Personally I have not ran this test, nor do i know how. Have you
tried it yourself? I would like to know this information for our own
systems. (all AMD's)

> On Dec 1, 2006, at 2:13 PM, Greg Lindahl wrote:
>> On Fri, Dec 01, 2006 at 11:51:24AM +0100, Peter Kjellstrom wrote:
>>> This might be a bit naive but, if you spawn two procs on a dual
>>> core dual
>>> socket system then the linux kernel should automagically schedule
>>> them this
>>> way.
>> No, we checked this for OpenMP and MPI, and in both cases wiring the
>> processes to cores was significantly better. The Linux scheduler
>> (still) tends to migrate processes to the wrong core when OS threads
>> and processes wake up and go back to sleep.
>> Just like the OpenMPI guys, we don't have a clever solution for the
>> "what if the user wants to have 2 OpenMP or MPI jobs share the same
>> node?" Well, I have a plan, but it's annoying to implement.
>> -- greg
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> --
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
> _______________________________________________
> users mailing list
> users_at_[hidden]