Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [RFC] Hierarchical Topology
From: Sylvain Jeaugey (sylvain.jeaugey_at_[hidden])
Date: 2010-11-15 11:00:00


I already mentionned it answering Terry's e-mail, but to be sure I'm clear
: don't confuse node full topology with MPI job topology. It _is_
different.

And every process does not get the whole topology in hitopo, only its own,
which should not cause storms.

On Mon, 15 Nov 2010, Ralph Castain wrote:

> I think the two efforts (the paffinity and this one) do overlap somewhat.
> I've been writing the local topology discovery code for Jeff, Terry, and
> Josh - uses hwloc (or any other method - it's a framework) to discover what
> hardware resources are available on each node in the job so that the info
> can be used in mapping the procs.
>
> As part of that work, we are passing down to the mpi processes the local
> hardware topology. This is done because of prior complaints when we had each
> mpi process discover that info for itself - it creates a bit of a "storm" on
> the node of large smp's.
>
> Note that what I've written (still to be completed before coming over)
> doesn't tell the proc what cores/HT's it is bound to - that's the part Terry
> et al are adding. Nor were we discovering the switch topology of the
> cluster.
>
> So a little overlap that could be resolved. And a concern on my part: we
> have previously introduced capabilities that had every mpi process read
> local system files to get node topology, and gotten user complaints about
> it. We probably shouldn't go back to that practice.
>
> Ralph
>
>
> On Mon, Nov 15, 2010 at 8:15 AM, Terry Dontje <terry.dontje_at_[hidden]>wrote:
>
>> A few comments:
>>
>> 1. Have you guys considered using hwloc for level 4-7 detection?
>> 2. Is L2 related to L2 cache? If no then is there some other term you
>> could use?
>> 3. What do you see if the process is bound to multiple cores/hyperthreads?
>> 4. What do you see if the process is not bound to any level 4-7 items?
>> 5. What about L1 and L2 cache locality as some levels? (hwloc exposes
>> these but these are also at different depths depending on the platform).
>>
>> Note I am working with Jeff Squyres and Josh Hursey on some new paffinity
>> code that uses hwloc. Though the paffinity code may not have direct
>> relationship to hitopo the use of hwloc and standardization of what you call
>> level 4-7 might help avoid some user confusions.
>>
>> --td
>>
>>
>> On 11/15/2010 06:56 AM, Sylvain Jeaugey wrote:
>>
>> As a followup of Stuttgart's developper's meeting, here is an RFC for our
>> topology detection framework.
>>
>> WHAT: Add a framework for hardware topology detection to be used by any
>> other part of Open MPI to help optimization.
>>
>> WHY: Collective operations or shared memory algorithms among others may
>> have optimizations depending on the hardware relationship between two MPI
>> processes. HiTopo is an attempt to provide it in a unified manner.
>>
>> WHERE: ompi/mca/hitopo/
>>
>> WHEN: When wanted.
>>
>> ==========================================================================
>> We developped the HiTopo framework for our collective operation component,
>> but it may be useful for other parts of Open MPI, so we'd like to contribute
>> it.
>>
>> A wiki page has been setup :
>> https://svn.open-mpi.org/trac/ompi/wiki/HiTopo
>>
>> and a bitbucket repository :
>> http://bitbucket.org/jeaugeys/hitopo/
>>
>> In a few words, we have 3 steps in HiTopo :
>>
>> - Detection : each MPI process detects its topology at various levels :
>> - core/socket : through the cpuid component
>> - node : through gethostname
>> - switch/island : through openib (mad) or slurm
>> [ Other topology detection components may be added for other
>> resource managers, specific hardware or whatever we want ...]
>>
>> - Collection : an allgather is performed to have all other processes'
>> addresses
>>
>> - Renumbering : "string" addresses are converted to numbers starting at 0
>> (Example : nodenames "foo" and "bar" are renamed 0 and 1).
>>
>> Any comment welcome,
>> Sylvain
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> --
>> [image: Oracle]
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle * - Performance Technologies*
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.dontje_at_[hidden]
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>