Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [OMPI devel] processor affinity -- OpenMPI / batch system integration
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-10-22 11:16:08


Hi Rayson

You're probably aware: starting with 1.3.4, OMPI will detect and abide
by external bindings. So if grid engine sets a binding, we'll follow it.

Ralph

On Oct 22, 2009, at 9:03 AM, Rayson Ho wrote:

> The code for the Job to Core Binding (aka. thread binding, or CPU
> binding) feature was checked into the Grid Engine project cvs. It uses
> OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is
> topology and NUMA aware.
>
> The presentation from HPC Software Workshop '09:
> http://wikis.sun.com/download/attachments/170755116/job2core.pdf
>
> The design doc:
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897
>
> Initial support is planned for 6.2 update 5 (current release is update
> 4, so update 5 is likely to be released in the next 2 or 3 months).
>
> Rayson
>
>
>
> On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> Note that we would also have to modify OMPI to:
>>
>> 1. recognize these environmental variables, and
>>
>> 2. use them to actually set the binding, instead of using OMPI-
>> internal
>> directives
>>
>> Not a big deal to do, but not something currently in the system.
>> Since we
>> launch through our own daemons (something that isn't likely to
>> change in
>> your time frame), these changes would be required.
>>
>> Otherwise, we could come up with some method by which you could
>> provide
>> mapper information we use. While I agree with Jeff that having you
>> tell us
>> which cores to use for each rank would generally be better, it does
>> raise
>> issues when users want specific mapping algorithms that you might not
>> support. For example, we are working on mappers that will take
>> input from
>> the user regarding comm topology plus system info on network wiring
>> topology
>> and generate a near-optimal mapping of ranks. As part of that,
>> users may
>> request some number of cores be reserved for that rank for
>> threading or
>> other purposes.
>>
>> So perhaps both options would be best - give us the list of cores
>> available
>> to us so we can map and do affinity, and pass in your own mapping.
>> Maybe
>> with some logic so we can decide which to use based on whether OMPI
>> or GE
>> did the mapping??
>>
>> Not sure here - just thinking out loud.
>> Ralph
>>
>> On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote:
>>
>>> On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote:
>>>
>>>> Restarting this discussion. A new update version of Grid Engine 6.2
>>>> will come out early next year [1], and I really hope that we can
>>>> get
>>>> at least the interface defined.
>>>
>>> Great!
>>>
>>>> At the minimum, is it enough for the batch system to tell OpenMPI
>>>> via
>>>> an env variable which core (or virtual core, in the SMT case) to
>>>> start
>>>> binding the first MPI task?? I guess an added bonus would be
>>>> information about the number of processors to skip (the stride)
>>>> between the sibling tasks?? Stride of one is usually the case, but
>>>> something larger than one would allow the batch system to control
>>>> the
>>>> level of cache and memory bandwidth sharing between the MPI
>>>> tasks...
>>>
>>> Wouldn't it be better to give us a specific list of cores to bind
>>> to? As
>>> core counts go up in servers, I think we may see a re-emergence of
>>> having
>>> multiple MPI jobs on a single server. And as core counts go even
>>> *higher*,
>>> then fragmentation of available cores over time is possible/likely.
>>>
>>> Would you be giving us a list of *relative* cores to bind to
>>> (i.e., "bind
>>> to the Nth online core on the machine" -- which may be different
>>> than the
>>> OS's ID for that processor) or will you be giving us the actual OS
>>> virtual
>>> processor ID(s) to bind to?
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users