Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] receive 0x0 from hwloc_cuda_get_device_cpuset
From: Albert Solernou (albert.solernou_at_[hidden])
Date: 2012-02-16 09:26:42


bffffff... painful answer.

Is there anything easy that the administrators of the cluster could do?
How could I persuade them that this is an easy task to do?
:)

Thanks,
Albert

On Thu 16 Feb 2012 14:18:07 GMT, Brice Goglin wrote:
> Your machine has a buggy BIOS. It reports an empty locality info for
> PCI device. That's why hwloc cpuset is empty as well. I guess we
> should detect this case and return the entire machine cpuset instead.
>
> Something like this should help:
>
> Index: include/hwloc/cuda.h
> ===================================================================
> --- include/hwloc/cuda.h (révision 4302)
> +++ include/hwloc/cuda.h (copie de travail)
> @@ -92,6 +92,8 @@
> return -1;
>
> hwloc_linux_parse_cpumap_file(sysfile, set);
> + if (hwloc_bitmap_iszero(set))
> + hwloc_bitmap_copy(set, hwloc_topology_get_complete_cpuset(topology));
>
> fclose(sysfile);
> #else
>
>
> Brice
>
>
>
> Le 16/02/2012 15:09, Albert Solernou a écrit :
>> Hi Brice,
>> I attach a tar ball with the outputs.
>>
>> It may be also relevant to specify that I am running hwloc on a
>> cluster, so this is the output on a node with two GPU cards.
>>
>> Thank you,
>> Albert
>>
>> On 16/02/12 13:56, Brice Goglin wrote:
>>> Hello Albert,
>>> Does lstopo show PCI devices properly?
>>> Can you send these outputs?
>>> lstopo -.xml
>>> and
>>> for i in /sys/bus/pci/devices/* ; do echo -n "$i " ; cat
>>> $i/local_cpus ; done
>>> Brice
>>>
>>>
>>>
>>> Le 16/02/2012 14:28, Albert Solernou a écrit :
>>>> Hi,
>>>> I am receiving cpuset 0x0 when I call hwloc_cuda_get_device_cpuset.
>>>> The exact output of tests/cuda.c is:
>>>> got cpuset 0x0 for device 0
>>>> got cpuset 0x0 for device 1
>>>>
>>>>
>>>> I have tried hwloc 1.3 and 1.4, using gnu and intel compilers. I am on
>>>> a ROCKS cluster, with two NVidia C2050 GPU cards,
>>>> Everything else seems to be working fine... What could I check for?
>>>> What information do you need to help me?
>>>>
>>>> Thank you,
>>>> Albert
>>
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>