Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2005-07-18 08:14:34


Did a little digging into this last night, and finally figured out what
you were getting at in your comments here. Yeah, I think an "affinity"
framework would definitely be the best approach - can handle both cpu
and memory, I imagine. Isn't clear how pressing that is as it is mostly
an optimization issue, but you're welcome to create the framework if you
like.

On Sun, 2005-07-17 at 09:13, Jeff Squyres wrote:

> It needs to be done in the launched process itself. So we'd either
> have to extend rmaps (from my understanding of rmaps, that doesn't seem
> like a good idea), or do something different.
>
> Perhaps the easiest thing to do is to add this to the LANL meeting
> agenda...? Then we can have a whiteboard to discuss. :-)
>
>
>
> On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote:
>
> > Wouldn't it belong in the rmaps framework? That's where we tell the
> > launcher where to put each process - seems like a natural fit.
> >
> >
> > On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote:
> >
> >> I'm thinking that we should add some processor affinity code to OMPI
> >> --
> >> possibly in the orte layer (ORTE is the interface to the back-end
> >> launcher, after all). This will really help on systems like opterons
> >> (and others) to prevent processes from bouncing between processors,
> >> and
> >> potentially getting located far from "their" RAM.
> >>
> >> This has the potential to help even micro-benchmark results (e.g.,
> >> ping-pong). It's going to be quite relevant for my shared memory
> >> collective work on mauve.
> >>
> >>
> >> General scheme:
> >> ---------------
> >>
> >> I think that somewhere in ORTE, we should actively set processor
> >> affinity when:
> >> - Supported by the OS
> >> - Not disabled by the user (via MCA param)
> >> - The node is not over-subscribed with processes from this job
> >>
> >> Generally speaking, if you launch <=N processes in a job on a node
> >> (where N == number of CPUs on that node), then we set processor
> >> affinity. We set each process's affinity to the CPU number according
> >> to the VPID ordering of the procs in that job on that node. So if you
> >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
> >> go to processor 1, etc. (it's an easy, locally-determined ordering).
> >>
> >> Someday, we might want to make this scheme universe-aware (i.e., see
> >> if
> >> any other ORTE jobs are running on that node, and not schedule on any
> >> processors that are already claimed by the processes on that(those)
> >> job(s)), but I think single-job awareness is sufficient for the
> >> moment.
> >>
> >>
> >> Implementation:
> >> ---------------
> >>
> >> We'll need relevant configure tests to figure out if the target system
> >> as CPU affinity system calls. Those are simple to add.
> >>
> >> We could use simply #if statements for the affinity stuff or make it a
> >> real framework. Since it's only 1 function call to set the affinity,
> >> I
> >> tend to lean towards the [simpler] #if solution, but could probably be
> >> pretty easily convinced that a framework is the Right solution. I'm
> >> on
> >> the fence (and if someone convinces me, I'd volunteer for the extra
> >> work to setup the framework).
> >>
> >> I'm not super-familiar with the processor-affinity stuff (e.g., for
> >> best effect, should it be done after the fork and before the exec?),
> >> so
> >> I'm not sure exactly where this would go in ORTE. Potentially either
> >> before new processes are exec'd (where we only have control of that in
> >> some kinds of systems, like rsh/ssh) or right up very very near the
> >> top
> >> of orte_init().
> >>
> >> Comments?
> >>
> >> --
> >> {+} Jeff Squyres
> >> {+} The Open MPI Project
> >> {+} http://www.open-mpi.org/
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >