Unfortunately, I can’t commit time, but I would like to help where possible.


This code should map a CUdevice to a numa node (by enumerating all PCI devices). I have not compiled the the code in this form or tested it as is, but the calls should work fine for mapping any cuda device to the OS enumeration wrt to PCI device location:


long GetNumaNode(CUdevice dev)


    BOOL ret;

    DWORD lastError;

    // get the cuda device string

    char cuDevString[CUDA_DEV_STRING_LEN];

    unsigned long cudaBus;

    unsigned long cudaSubdevice;

    unsigned long cudaFunction;

    CUresult status = cuDeviceGetPCIBusId(cuDevString, CUDA_DEV_STRING_LEN, dev);


    assert(CUDA_SUCCESS == status);

    if (CUDA_SUCCESS != status) {

        return 0;



    char *tmp;

    char *tmp2;

    char del[] = ":.";

    // remove domain

    tmp = strtok_s(cuDevString, del, &tmp2);

    // get bus

    tmp = strtok_s(NULL, del, &tmp2);

    sscanf_s(tmp, "%x", &cudaBus);

    // get subdevice

    tmp = strtok_s(NULL, del, &tmp2);

    sscanf_s(tmp, "%x", &cudaSubdevice);

    // get function

    tmp = strtok_s(NULL, del, &tmp2);

    sscanf_s(tmp, "%x", &cudaFunction);


    // Use NULL as the first parameter as we need to look at non display devices too



    if(hNvDevInfo == INVALID_HANDLE_VALUE)



        return 0;



    // Find the deviceInfoData for each GPU

    DWORD deviceIndex;

    for (deviceIndex = 0; ; deviceIndex++)


        SP_DEVINFO_DATA deviceInfoData;

        unsigned long bus;

        unsigned long subdevice;

        unsigned long function;

        deviceInfoData.cbSize = sizeof(SP_DEVINFO_DATA);


        ret = SetupDiEnumDeviceInfo(hNvDevInfo, deviceIndex, &deviceInfoData);

        if (!ret)


            // MSDN says:

            //   call SetupDiEnumDeviceInfo until there are no more values (the function fails and a call

            //   to GetLastError returns ERROR_NO_MORE_ITEMS).

            lastError = GetLastError();

            assert(lastError == ERROR_NO_MORE_ITEMS);




        char locinfo[256];

        ret = SetupDiGetDeviceRegistryPropertyA(hNvDevInfo, &deviceInfoData, SPDRP_LOCATION_INFORMATION, NULL,

            (PBYTE)locinfo, sizeof(locinfo), NULL);

        if (!ret)


            lastError = GetLastError();



        bool dataSet = false;

        if (strncmp(locinfo, "PCI", 3) == 0) {

            char *busString = strstr(locinfo, "bus");

            if (busString) {

                busString += 3;

                char *deviceString = strstr(locinfo, ",");

                if (deviceString) {

                    deviceString[0] = 0;

                    bus = atoi(busString);


                    deviceString = strstr(deviceString, "device");


                    if (deviceString) {


                        char *functionStr = strstr(deviceString, ",");

                        if (functionStr) {

                            functionStr[0] = 0;

                            subdevice = atoi(deviceString);


                            functionStr = strstr(functionStr, "function");

                            if (functionStr) {


                                function = atoi(functionStr);

                                dataSet = true;









        if (dataSet &&

            (bus == cudaBus) &&

            (subdevice == cudaSubdevice) &&

            (function == cudaFunction))


            ret = SetupDiGetDeviceRegistryPropertyA(hNvDevInfo, &deviceInfoData, SPDRP_HARDWAREID, NULL,

                (PBYTE)locinfo, sizeof(locinfo), NULL);


            printf("locinfo %s\n", locinfo);


            int data[20];

            data[0] = 0;

            DEVPROPTYPE type;

            DEVPROPKEY key = DEVPKEY_Numa_Proximity_Domain;


            lastError = 0;


            ret =  SetupDiGetDeviceProperty(hNvDevInfo, &deviceInfoData,&key , &type, (PBYTE)&data[0], 20*sizeof(int), NULL,0);


            if (!ret)


                lastError = GetLastError();



            printf("DEVPKEY_Numa_Proximity_Domain %d err %d\n", data[0], lastError);

            key = DEVPKEY_Device_Numa_Node;

            lastError = 0;

            ret =  SetupDiGetDeviceProperty(hNvDevInfo, &deviceInfoData,&key , &type, (PBYTE)&data[0], 20*sizeof(int), NULL,0);


            if (!ret)


                lastError = GetLastError();



            printf("DEVPKEY_Device_Numa_Node %d err %d\n", data[0], lastError);


            return data[0];




    return -1;



From: hwloc-users [mailto:hwloc-users-bounces@open-mpi.org] On Behalf Of Brice Goglin
Sent: Monday, November 18, 2013 11:09 AM
To: Hardware locality user list
Subject: Re: [hwloc-users] windows PCI locality (was; DELL 8 core machine + Quadro K5000 GPU Card...)


This seems unrelated since he seems to be running Linux anyway.

We got that information a while ago but I couldn't do anything with it because (I think) I didn't have access to a Windows release that supported this. And, bigger problem, I don't have access to a Windows machine with more than one socket. I can't actually test the code anywhere.

Are you volunteering to write some code? I am not saying that you should write the entire hwloc support, but some example would help a lot.

Once we have the device locality, we'll need the devices too. The windows code misses the entire device listing code. Do you have any idea how to list PCI devices, match them with CUDA GPUs, etc ?


Le 18/11/2013 02:52, Ashley Reid a écrit :

Maybe not completely related to your issue, but the windows code misses the correct enumeration to see where the GPU is in a NUMA system. The code needs to look at:


Use “DEVPKEY_Numa_Proximity_Domain” and “DEVPKEY_Device_Numa_Node” when calling SetupDiGetDeviceProperty.




   “Windows Server 2003, Windows XP, and Windows 2000 do not support this property.” – So should be fine on win7 and win8?



But this only works if the bios has the right ACPI entries, we filed a bug and got a update for the z820 from HP. This relies on the _PXM  value in the ACPI tables.


You can use windbg and !nstree to view the tables. There inside should be some _PXM values.





From: hwloc-users [mailto:hwloc-users-bounces@open-mpi.org] On Behalf Of Solibakke Per Bjarte
Sent: Monday, November 18, 2013 10:15 AM
To: hwloc-users@open-mpi.org
Subject: [hwloc-users] DELL 8 core machine + Quadro K5000 GPU Card...




I recently got access to a very interesting and powerful machine: Dell 8 core + GPU Quadro K5000 (96 cores).

A total of 1536 cores in the original machine configuration.


I installed first HWLOC 1.7 version and I also installed the newly released beta 1.8. The final installation lines report PCI (linux) CUDA.

However, the commands:


Lstopo —whole-system and lstopo —whole-io


report only the 8 CPU-cores. No reference to PCI-Bridges, eth0, seas +++ and the GPUs.


Is the installation of the machine the problem or is my 

./configure —prefix=/usr/local/hwloc


missing some vital elements?





Dr.econ Per Bjarte Solibakke



Cell phone: 004790035606

Phone: 004771214238

This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

hwloc-users mailing list