Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-09 16:23:47

Here's what I would do:
During init, walk the list of hwloc PCI devices
(hwloc_get_next_pcidev()) and keep an array of pointers to the
interesting onces + their locality (the hwloc cpuset of the parent
non-IO object).
When you want the I/O device near a core, walk the array and find one
whose locality contains your core hwloc cpuset.

If you need help, feel free to contact me offline.


Le 09/02/2012 22:14, Ralph Castain a écrit :
> Hmmm….guess we'll have to play with it. Our need is to start with a
> core or some similar object, and quickly determine the closest IO
> device of a certain type. We wound up having to write "summarizer"
> code to parse the hwloc tree into a more OMPI-usable form, so we can
> always do that with the IO tree as well if necessary.
> On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:
>> That doesn't really work with the hwloc model unfortunately. Also,
>> when you get to smaller objects (cores, threads, ...) there are
>> multiple "closest" objects at each depth.
>> We have one "closest" object at some depth (usually Machine or NUMA
>> node). If you need something higher, you just walk the parent links.
>> If you need something smaller, you look at children.
>> Also, each I/O device isn't directly attached to such a closest
>> object. It's usually attached under some bridge objects. There's a
>> tree of hwloc PCI bus objects exactly like you have a tree of hwloc
>> sockets/cores/threads/etc. At the top of the I/O tree, one (bridge)
>> object is attached to a regular object as explained earlier. So, when
>> you have a random hwloc PCI object, you get its locality by walking
>> up its parent link until you find a non-I/O object (one whose cpuset
>> isn't NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj()
>> to do that.
>> Brice
>> Le 09/02/2012 14:34, Ralph Castain a écrit :
>>> Ah, okay - in that case, having the I/O device attached to the
>>> "closest" object at each depth would be ideal from an OMPI perspective.
>>> On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
>>>> The bios usually tells you which numa location is close to each
>>>> host-to-pci bridge. So the answer is yes.
>>>> Brice
>>>> Ralph Castain <rhc_at_[hidden] <mailto:rhc_at_[hidden]>> a écrit :
>>>> I'm not sure I understand this comment. A PCI device is
>>>> attached to the node, not to any specific location within the
>>>> node, isn't it? Can you really say that a PCI device is
>>>> "attached" to a specific NUMA location, for example?
>>>> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
>>>>> That doesn't seem too attractive from an OMPI perspective,
>>>>> though. We'd want to know where the PCI devices are actually
>>>>> rooted.
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden] <mailto:devel_at_[hidden]>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden] <mailto:devel_at_[hidden]>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]