Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-11-02 16:22:29


Le 02/11/2012 21:03, Brock Palen a écrit :
> This isn't a hwloc problem exactly, but maybe you can shed some insight.
>
> We have some 4 socket 10 core = 40 core nodes, HT off:
>
> depth 0: 1 Machine (type #1)
> depth 1: 4 NUMANodes (type #2)
> depth 2: 4 Sockets (type #3)
> depth 3: 4 Caches (type #4)
> depth 4: 40 Caches (type #4)
> depth 5: 40 Caches (type #4)
> depth 6: 40 Cores (type #5)
> depth 7: 40 PUs (type #6)
>
>
> We run rhel 6.3 we use torque to create cgroups for jobs. I get the following cgroup for this job all 12 cores for the job are on one node:
> cat /dev/cpuset/torque/8845236.nyx.engin.umich.edu/cpus
> 0-1,4-5,8,12,16,20,24,28,32,36
>
> Not all nicely spaced, but 12 cores
>
> I then start a code, even a simple serial code with openmpi 1.6.0 on all 12 cores:
> mpirun ./stream
>
> 45521 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 4:02.72 stream
> 45522 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 1:46.08 stream
> 45525 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 4:02.72 stream
> 45526 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 1:46.07 stream
> 45527 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 4:02.71 stream
> 45528 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 4:02.71 stream
> 45532 brockp 20 0 1885m 1.8g 456 R 100.0 0.2 1:46.05 stream
> 45529 brockp 20 0 1885m 1.8g 456 R 99.2 0.2 4:02.70 stream
> 45530 brockp 20 0 1885m 1.8g 456 R 99.2 0.2 4:02.70 stream
> 45531 brockp 20 0 1885m 1.8g 456 R 33.6 0.2 1:20.89 stream
> 45523 brockp 20 0 1885m 1.8g 456 R 32.8 0.2 1:20.90 stream
> 45524 brockp 20 0 1885m 1.8g 456 R 32.8 0.2 1:20.89 stream
>
> Note the processes that are not running at 100% cpu,
>
> hwloc-bind --get --pid 45523
> 0x00000011,0x11111133
> <the same mask is reported for all 12 processes>

Hello Brock,

I don't see any helpful to answer here :/

Do you know which core is overloaded and which (two?) cores are idle?
Does that change during one run or from one run to another?
Pressing 1 in top should give that information in the very first lines.
Then, you can try to binding another process to one of the idle cores,
to see if the kernel accepts that.

You can also press "f" and "j" (or "f" and use arrows and space to
select "last used cpu") to add a "P" line which tells you the last CPU
used by each process.
hwloc-bind --get-last-cpu-location --pid <pid> should give the same info
but it seems broken on my machine right now, going to debug.

One thing to check would be to run more than 12 cores and check where
the kernel puts them. If it keeps ignoring two cores, that would be funny :)

Brice