Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-09 16:23:47


Here's what I would do:
During init, walk the list of hwloc PCI devices
(hwloc_get_next_pcidev()) and keep an array of pointers to the
interesting onces + their locality (the hwloc cpuset of the parent
non-IO object).
When you want the I/O device near a core, walk the array and find one
whose locality contains your core hwloc cpuset.

If you need help, feel free to contact me offline.

Brice

Le 09/02/2012 22:14, Ralph Castain a écrit :
> Hmmm….guess we'll have to play with it. Our need is to start with a
> core or some similar object, and quickly determine the closest IO
> device of a certain type. We wound up having to write "summarizer"
> code to parse the hwloc tree into a more OMPI-usable form, so we can
> always do that with the IO tree as well if necessary.
>
>
> On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:
>
>> That doesn't really work with the hwloc model unfortunately. Also,
>> when you get to smaller objects (cores, threads, ...) there are
>> multiple "closest" objects at each depth.
>>
>> We have one "closest" object at some depth (usually Machine or NUMA
>> node). If you need something higher, you just walk the parent links.
>> If you need something smaller, you look at children.
>>
>> Also, each I/O device isn't directly attached to such a closest
>> object. It's usually attached under some bridge objects. There's a
>> tree of hwloc PCI bus objects exactly like you have a tree of hwloc
>> sockets/cores/threads/etc. At the top of the I/O tree, one (bridge)
>> object is attached to a regular object as explained earlier. So, when
>> you have a random hwloc PCI object, you get its locality by walking
>> up its parent link until you find a non-I/O object (one whose cpuset
>> isn't NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj()
>> to do that.
>>
>> Brice
>>
>>
>>
>> Le 09/02/2012 14:34, Ralph Castain a écrit :
>>> Ah, okay - in that case, having the I/O device attached to the
>>> "closest" object at each depth would be ideal from an OMPI perspective.
>>>
>>> On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
>>>
>>>> The bios usually tells you which numa location is close to each
>>>> host-to-pci bridge. So the answer is yes.
>>>> Brice
>>>>
>>>>
>>>> Ralph Castain <rhc_at_[hidden] <mailto:rhc_at_[hidden]>> a écrit :
>>>>
>>>> I'm not sure I understand this comment. A PCI device is
>>>> attached to the node, not to any specific location within the
>>>> node, isn't it? Can you really say that a PCI device is
>>>> "attached" to a specific NUMA location, for example?
>>>>
>>>>
>>>> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
>>>>
>>>>> That doesn't seem too attractive from an OMPI perspective,
>>>>> though. We'd want to know where the PCI devices are actually
>>>>> rooted.
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel