Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [hwloc-users] Using hwloc to map GPU layout on system
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2014-02-05 01:19:12


Hello Brock,

Some people reported the same issue in the past and that's why we added
the "nvml" objects. CUDA reorders devices by "performance".
Batch-schedulers are somehow supposed to use "nvml" for managing GPUs
without actually using them with CUDA directly. And the "nvml" order is
the "normal" order.

You need "tdk" (https://developer.nvidia.com/tesla-deployment-kit) to
get nvml library and development headers installed. Then hwloc can build
its "nvml" backend. Once ready, you'll see a hwloc "cudaX" and a hwloc
"nvmlY" object in each NVIDIA PCI devices, and you can get their
locality as usual.

Does this help?

Brice

Le 05/02/2014 05:25, Brock Palen a écrit :
> We are trying to build a system to mask users to the GPU's they were assigned by our batch system (torque).
>
> The batch system sets the GPU's into thread exclusive mode when assigned to a job, so we want the GPU that the batch system assigns to be the one set in CUDA_VISIBLE_DEVICES,
>
> Problem is on our nodes what the batch system sees as gpu 0 is not the same GPU that CUDA_VISIBLE_DEVICES sees as 0. Actually 0 is 2.
>
> You can see this behavior is you run
>
> nvidia-smi and look at the PCI ID's of the devices. You can then look at the PCI ID's outputed by deviceQuery from the SDK examples and see they are in a different order.
>
> The ID's you would set in CUDA_VISIBLE_DEVICES matches the order that deviceQuery sees, not the order that nvida-smi sees.
>
> Example (All values turned to decimal to match deviceQuery):
>
> nvidia-smi order: 9, 10, 13, 14, 40, 43, 48, 51
> dviceQuery order: 13, 14, 9, 10, 40, 43, 48, 51
>
>
> Can hwloc help me with this? Right now I am hacking a script based on the output of the two commands, and making a map, between the two and then set CUDA_VISIBLE_DEVICES
>
> Any ideas would be great. Later as we currently also use CPU sets, we want to pass GPU locality information to the scheduler to make decisions to match GPU-> CPU Socket information, as performance of threads across QPI domains is very poor.
>
> Thanks
>
> Machine (64GB)
> NUMANode L#0 (P#0 32GB)
> Socket L#0 + L3 L#0 (20MB)
> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
> L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
> L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
> L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
> L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
> HostBridge L#0
> PCIBridge
> PCI 1000:0087
> Block L#0 "sda"
> Block L#1 "sdb"
> PCIBridge
> PCIBridge
> PCIBridge
> PCI 10de:1021
> CoProc L#2 "cuda0"
> PCIBridge
> PCI 10de:1021
> CoProc L#3 "cuda1"
> PCIBridge
> PCIBridge
> PCIBridge
> PCI 10de:1021
> CoProc L#4 "cuda2"
> PCIBridge
> PCI 10de:1021
> CoProc L#5 "cuda3"
> PCIBridge
> PCI 8086:1521
> Net L#6 "eth0"
> PCI 8086:1521
> Net L#7 "eth1"
> PCIBridge
> PCI 102b:0533
> PCI 8086:1d02
> NUMANode L#1 (P#1 32GB)
> Socket L#1 + L3 L#1 (20MB)
> L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
> L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
> L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
> L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)
> L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)
> L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)
> L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)
> L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
> HostBridge L#12
> PCIBridge
> PCIBridge
> PCIBridge
> PCI 15b3:1003
> Net L#8 "eth2"
> Net L#9 "ib0"
> Net L#10 "eoib0"
> OpenFabrics L#11 "mlx4_0"
> PCIBridge
> PCIBridge
> PCIBridge
> PCI 10de:1021
> CoProc L#12 "cuda4"
> PCIBridge
> PCI 10de:1021
> CoProc L#13 "cuda5"
> PCIBridge
> PCIBridge
> PCIBridge
> PCI 10de:1021
> CoProc L#14 "cuda6"
> PCIBridge
> PCI 10de:1021
> CoProc L#15 "cuda7"
>
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> brockp_at_[hidden]
> (734)936-1985
>
>
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users