Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [OMPI devel] processor affinity -- OpenMPI / batchsystem integration
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-10-22 18:05:55


SGE might want to be aware that PLPA has now been deprecated -- we're
doing all future work on "hwloc" (hardware locality). That is, hwloc
represents the merger of PLPA and libtopology from INRIA. The
majority of the initial code base came from libtopology; more PLPA-
like features will come in over time (e.g., embedding capabilities).

hwloc provides all kinds of topology information about the machine.

The first release of hwloc -- v0.9.1 -- will be "soon" (we're in rc
status right now), but it will not include PLPA-like embedding
capabilities. Embedding is slated for v1.0.

Come join our mailing lists if you're interested:

     http://www.open-mpi.org/projects/hwloc/

On Oct 22, 2009, at 11:26 AM, Rayson Ho wrote:

> Yes, on page 14 of the presentation: "Support for OpenMPI and OpenMP
> Through -binding [pe|env] linear|striding" -- SGE performs no binding,
> but instead it outputs the binding decision to OpenMPI.
>
> Support for OpenMPI's binding is part of the "Job to Core Binding"
> project.
>
> Rayson
>
>
>
> On Thu, Oct 22, 2009 at 10:16 AM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> > Hi Rayson
> >
> > You're probably aware: starting with 1.3.4, OMPI will detect and
> abide by
> > external bindings. So if grid engine sets a binding, we'll follow
> it.
> >
> > Ralph
> >
> > On Oct 22, 2009, at 9:03 AM, Rayson Ho wrote:
> >
> >> The code for the Job to Core Binding (aka. thread binding, or CPU
> >> binding) feature was checked into the Grid Engine project cvs. It
> uses
> >> OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is
> >> topology and NUMA aware.
> >>
> >> The presentation from HPC Software Workshop '09:
> >> http://wikis.sun.com/download/attachments/170755116/job2core.pdf
> >>
> >> The design doc:
> >>
> >> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897
> >>
> >> Initial support is planned for 6.2 update 5 (current release is
> update
> >> 4, so update 5 is likely to be released in the next 2 or 3 months).
> >>
> >> Rayson
> >>
> >>
> >>
> >> On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain <rhc_at_[hidden]>
> wrote:
> >>>
> >>> Note that we would also have to modify OMPI to:
> >>>
> >>> 1. recognize these environmental variables, and
> >>>
> >>> 2. use them to actually set the binding, instead of using OMPI-
> internal
> >>> directives
> >>>
> >>> Not a big deal to do, but not something currently in the system.
> Since we
> >>> launch through our own daemons (something that isn't likely to
> change in
> >>> your time frame), these changes would be required.
> >>>
> >>> Otherwise, we could come up with some method by which you could
> provide
> >>> mapper information we use. While I agree with Jeff that having
> you tell
> >>> us
> >>> which cores to use for each rank would generally be better, it
> does raise
> >>> issues when users want specific mapping algorithms that you
> might not
> >>> support. For example, we are working on mappers that will take
> input from
> >>> the user regarding comm topology plus system info on network
> wiring
> >>> topology
> >>> and generate a near-optimal mapping of ranks. As part of that,
> users may
> >>> request some number of cores be reserved for that rank for
> threading or
> >>> other purposes.
> >>>
> >>> So perhaps both options would be best - give us the list of cores
> >>> available
> >>> to us so we can map and do affinity, and pass in your own
> mapping. Maybe
> >>> with some logic so we can decide which to use based on whether
> OMPI or GE
> >>> did the mapping??
> >>>
> >>> Not sure here - just thinking out loud.
> >>> Ralph
> >>>
> >>> On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote:
> >>>
> >>>> On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote:
> >>>>
> >>>>> Restarting this discussion. A new update version of Grid
> Engine 6.2
> >>>>> will come out early next year [1], and I really hope that we
> can get
> >>>>> at least the interface defined.
> >>>>
> >>>> Great!
> >>>>
> >>>>> At the minimum, is it enough for the batch system to tell
> OpenMPI via
> >>>>> an env variable which core (or virtual core, in the SMT case)
> to start
> >>>>> binding the first MPI task?? I guess an added bonus would be
> >>>>> information about the number of processors to skip (the stride)
> >>>>> between the sibling tasks?? Stride of one is usually the case,
> but
> >>>>> something larger than one would allow the batch system to
> control the
> >>>>> level of cache and memory bandwidth sharing between the MPI
> tasks...
> >>>>
> >>>> Wouldn't it be better to give us a specific list of cores to
> bind to?
> >>>> As
> >>>> core counts go up in servers, I think we may see a re-emergence
> of
> >>>> having
> >>>> multiple MPI jobs on a single server. And as core counts go even
> >>>> *higher*,
> >>>> then fragmentation of available cores over time is possible/
> likely.
> >>>>
> >>>> Would you be giving us a list of *relative* cores to bind to
> (i.e.,
> >>>> "bind
> >>>> to the Nth online core on the machine" -- which may be
> different than
> >>>> the
> >>>> OS's ID for that processor) or will you be giving us the actual
> OS
> >>>> virtual
> >>>> processor ID(s) to bind to?
> >>>>
> >>>> --
> >>>> Jeff Squyres
> >>>> Cisco Systems
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
jsquyres_at_[hidden]