It needs to be done in the launched process itself. So we'd either
have to extend rmaps (from my understanding of rmaps, that doesn't seem
like a good idea), or do something different.
Perhaps the easiest thing to do is to add this to the LANL meeting
agenda...? Then we can have a whiteboard to discuss. :-)
On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote:
> Wouldn't it belong in the rmaps framework? That's where we tell the
> launcher where to put each process - seems like a natural fit.
>
>
> On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote:
>
>> I'm thinking that we should add some processor affinity code to OMPI
>> --
>> possibly in the orte layer (ORTE is the interface to the back-end
>> launcher, after all). This will really help on systems like opterons
>> (and others) to prevent processes from bouncing between processors,
>> and
>> potentially getting located far from "their" RAM.
>>
>> This has the potential to help even micro-benchmark results (e.g.,
>> ping-pong). It's going to be quite relevant for my shared memory
>> collective work on mauve.
>>
>>
>> General scheme:
>> ---------------
>>
>> I think that somewhere in ORTE, we should actively set processor
>> affinity when:
>> - Supported by the OS
>> - Not disabled by the user (via MCA param)
>> - The node is not over-subscribed with processes from this job
>>
>> Generally speaking, if you launch <=N processes in a job on a node
>> (where N == number of CPUs on that node), then we set processor
>> affinity. We set each process's affinity to the CPU number according
>> to the VPID ordering of the procs in that job on that node. So if you
>> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
>> go to processor 1, etc. (it's an easy, locally-determined ordering).
>>
>> Someday, we might want to make this scheme universe-aware (i.e., see
>> if
>> any other ORTE jobs are running on that node, and not schedule on any
>> processors that are already claimed by the processes on that(those)
>> job(s)), but I think single-job awareness is sufficient for the
>> moment.
>>
>>
>> Implementation:
>> ---------------
>>
>> We'll need relevant configure tests to figure out if the target system
>> as CPU affinity system calls. Those are simple to add.
>>
>> We could use simply #if statements for the affinity stuff or make it a
>> real framework. Since it's only 1 function call to set the affinity,
>> I
>> tend to lean towards the [simpler] #if solution, but could probably be
>> pretty easily convinced that a framework is the Right solution. I'm
>> on
>> the fence (and if someone convinces me, I'd volunteer for the extra
>> work to setup the framework).
>>
>> I'm not super-familiar with the processor-affinity stuff (e.g., for
>> best effect, should it be done after the fork and before the exec?),
>> so
>> I'm not sure exactly where this would go in ORTE. Potentially either
>> before new processes are exec'd (where we only have control of that in
>> some kinds of systems, like rsh/ssh) or right up very very near the
>> top
>> of orte_init().
>>
>> Comments?
>>
>> --
>> {+} Jeff Squyres
>> {+} The Open MPI Project
>> {+} http://www.open-mpi.org/
>>
>> _______________________________________________
>> devel mailing list
>> devel@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>