Sorry for the delay, but my victim with 2 ib devices had been stolen ;-)
So, I ported the patch on the v1.5 branch and finally could test it.
Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had to
the hwloc flags in ompi_mpi_init() and orte_odls_base_open() (i.e. the
where opal_hwloc_topology is initialized).
With the new flag set, hwloc_get_nbobjs_by_type(opal_hwloc_topology,
is now seeing the actual number of cores on the node (instead of 1 when
cpuset is a singleton).
Since opal_paffinity_base_get_processor_info() calls
(in hwloc/paffinity_hwloc_module.c), which in turn calls
we are now getting the right number of cores in get_ib_dev_distance().
So we are looping over the exact number of cores, looking for a potential
So as a conclusion, there's no need for any other patch: the fix you
was the only one needed to fix the issue.
Could you please move it to v1.5 (do I need to fill a CMR)?
devel-bounces_at_[hidden] wrote on 02/09/2012 06:00:48 PM:
> De : Jeff Squyres <jsquyres_at_[hidden]>
> A : Open MPI Developers <devel_at_[hidden]>
> Date : 02/09/2012 06:01 PM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-bounces_at_[hidden]
> Nadia --
> I committed the fix in the trunk to use HWLOC_WHOLE_SYSTEM and
> Do you want to revise your patch to use hwloc APIs with
> opal_hwloc_topology (instead of paffinity)? We could use that as a
> basis for the other places you identified that are doing similar things.
> On Feb 9, 2012, at 8:34 AM, Ralph Castain wrote:
> > Ah, okay - in that case, having the I/O device attached to the
> "closest" object at each depth would be ideal from an OMPI perspective.
> > On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
> >> The bios usually tells you which numa location is close to each
> host-to-pci bridge. So the answer is yes.
> >> Brice
> >> Ralph Castain <rhc_at_[hidden]> a écrit :
> >> I'm not sure I understand this comment. A PCI device is attached
> to the node, not to any specific location within the node, isn't it?
> Can you really say that a PCI device is "attached" to a specific
> NUMA location, for example?
> >> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
> >>> That doesn't seem too attractive from an OMPI perspective,
> though. We'd want to know where the PCI devices are actually rooted.
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Jeff Squyres
> For corporate legal information go to:
> devel mailing list