Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Hierarchical Topology
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-11-15 10:36:55


I think the two efforts (the paffinity and this one) do overlap somewhat.
I've been writing the local topology discovery code for Jeff, Terry, and
Josh - uses hwloc (or any other method - it's a framework) to discover what
hardware resources are available on each node in the job so that the info
can be used in mapping the procs.

As part of that work, we are passing down to the mpi processes the local
hardware topology. This is done because of prior complaints when we had each
mpi process discover that info for itself - it creates a bit of a "storm" on
the node of large smp's.

Note that what I've written (still to be completed before coming over)
doesn't tell the proc what cores/HT's it is bound to - that's the part Terry
et al are adding. Nor were we discovering the switch topology of the
cluster.

So a little overlap that could be resolved. And a concern on my part: we
have previously introduced capabilities that had every mpi process read
local system files to get node topology, and gotten user complaints about
it. We probably shouldn't go back to that practice.

Ralph

On Mon, Nov 15, 2010 at 8:15 AM, Terry Dontje <terry.dontje_at_[hidden]>wrote:

> A few comments:
>
> 1. Have you guys considered using hwloc for level 4-7 detection?
> 2. Is L2 related to L2 cache? If no then is there some other term you
> could use?
> 3. What do you see if the process is bound to multiple cores/hyperthreads?
> 4. What do you see if the process is not bound to any level 4-7 items?
> 5. What about L1 and L2 cache locality as some levels? (hwloc exposes
> these but these are also at different depths depending on the platform).
>
> Note I am working with Jeff Squyres and Josh Hursey on some new paffinity
> code that uses hwloc. Though the paffinity code may not have direct
> relationship to hitopo the use of hwloc and standardization of what you call
> level 4-7 might help avoid some user confusions.
>
> --td
>
>
> On 11/15/2010 06:56 AM, Sylvain Jeaugey wrote:
>
> As a followup of Stuttgart's developper's meeting, here is an RFC for our
> topology detection framework.
>
> WHAT: Add a framework for hardware topology detection to be used by any
> other part of Open MPI to help optimization.
>
> WHY: Collective operations or shared memory algorithms among others may
> have optimizations depending on the hardware relationship between two MPI
> processes. HiTopo is an attempt to provide it in a unified manner.
>
> WHERE: ompi/mca/hitopo/
>
> WHEN: When wanted.
>
> ==========================================================================
> We developped the HiTopo framework for our collective operation component,
> but it may be useful for other parts of Open MPI, so we'd like to contribute
> it.
>
> A wiki page has been setup :
> https://svn.open-mpi.org/trac/ompi/wiki/HiTopo
>
> and a bitbucket repository :
> http://bitbucket.org/jeaugeys/hitopo/
>
> In a few words, we have 3 steps in HiTopo :
>
> - Detection : each MPI process detects its topology at various levels :
> - core/socket : through the cpuid component
> - node : through gethostname
> - switch/island : through openib (mad) or slurm
> [ Other topology detection components may be added for other
> resource managers, specific hardware or whatever we want ...]
>
> - Collection : an allgather is performed to have all other processes'
> addresses
>
> - Renumbering : "string" addresses are converted to numbers starting at 0
> (Example : nodenames "foo" and "bar" are renamed 0 and 1).
>
> Any comment welcome,
> Sylvain
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> --
> [image: Oracle]
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>




picture