Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] CPU affinity of OS Devices?
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-11-07 19:02:13


Le 07/11/2012 16:56, Guy Streeter a écrit :
> On 11/06/2012 05:20 PM, Brice Goglin wrote:
>> Le 06/11/2012 23:55, Guy Streeter a écrit :
>>> On 11/06/2012 03:53 PM, Brice Goglin wrote:
>>>> Hello Guy,
>>>>
>>>> I don't think OS devices ever had a cpuset. All objects that are not
>>>> things where you can bind processes usually have NULL cpusets. So when
>>>> you have a PCI or OS device, you walk up the obj->parent pointer until
>>>> you find an object with a non-NULL cpuset. That's the affinity you're
>>>> looking for.
>>>>
>>>> You can use hwloc_get_non_io_ancestor_obj() (in hwloc/helper.h) to find
>>>> the first parent with non-NULL cpuset.
>>>>
>>>> Brice
>>>>
>>> I didn't mean to imply that they had gone away. My question is how do I
>>> specify a binding like "not on the same CPU that is handling the Ethernet
>>> interrupts"?
>>>
>> Assuming you have the OS device for this interrupt, find the parent
>> object whose cpuset field isn't NULL, reverse this cpuset with
>> hwloc_bitmap_not() and bind to some object that is inside the resulting
>> cpuset.
>>
>> Brice
>>
> The problem with that is that it will exclude all of the CPUs associated with
> the node, or all of the CPUs on the machine if it isn't NUMA. The CPU affinity
> of the IRQ can be set independent of the bounding object of the PCI bridge the
> device is below.
> On my machine, the ahci device uses IRQ 90, and the affinity of that IRQ is
> CPU 3. The first parent with a cpuset above the OSDev block object for my disk
> drive is the machine itself.
>

That's exactly why I started my reply with "assuming you have the OS
device for this interrupt". I meant something like "irq23" that we don't
support (that's what you requested earlier, right).

Indeed OSDev "eth3" instead of "irq23" wouldn't help at all. Not only
because the affinity of the NIC is usually at least an entire socket
instead of a single PU, but also because the interrupt can be configured
to go somewhere else, not even close to the NIC (if the administrator or
the irqbalance daemon decides to do so).

Brice