On Jul 24, 2009, at 8:38 AM, Sylvain Jeaugey wrote:
> > Is there any way for a process to tell the difference between "I
> can bind to
> > completely different processors than I'm already bound to" (i.e.,
> someone
> > just happened to bind me to cores X, Y, and Z, but I'm free to
> bind to cores
> > A, B, and C if I want to) and "I can only bind to cores within the
> set that
> > I'm already bound to" (i.e., cpuset)?
> It seems to me that noone just "happens" to bind a process to cores
> X, Y
> or Z. If there is an explicit binding, there must be some reason for
> that
> and getting "out" of this binding looks like doing something wrong
> to me.
>
Actually, at least in the context of OMPI, we had assumed that users
might be using taskset or plpa_taskset manually -- that was the case
we were assuming here.
> If there is another mechanism that does binding, either disable it
> or take
> it into account. In my [somewhat utopic] view, placement can't be
> done by
> two entities (in general, either the launcher or the MPI library).
>
Right. But the problem is that at least some subset of MPI users
today use taskset/plpa_taskset to bind their MPI processes because
many launchers do *not* already bind for them.
But keep in mind that PLPA is wider scope than just MPI. :-)
> > Admittedly, I have not looked at libcpuset yet -- is this
> something that
> > libcpuset does? If so, we could get that kind of functionality by
> linking
> > into libcpuset (if it's available).
> I don't really like creating a dependency on libcpuset, because having
> cpuset functionnality and using it doesn't mean having libcpuset
> installed. The majority of resource managers use the /dev/cpuset
> interface
> directly and we are fine with it. In the past, we also had our own
> libcpuset ; it was a bit different from the one from SGI (which must
> be a
> lot better than ours) but we never really released it because people
> (i.e.
> RMS, SLURM, and, as it looks like, Torque also) were just fine with
> the
> /dev/cpuset pseudo-filesystem.
>
Ah, ok -- fair enough. If we can implement the functionality we want/
need easily enough manually, then so be it.
> Since cgroups seem to work differently (I've not worked on cpusets for
> quite a long time now, so I'm discovering things a bit here),
> working on
> /dev/cpuset seems also broken. Which leaves us with just getting the
> current affinity - a simple and universal rule.
>
I guess my point is that there really are 2 different pieces of
information:
- what processors you *can* bind to
- what processors you *are* bound to
Even if these two pieces of information are sometimes the same,
sometimes they're not. Hence, I'm suggesting that PLPA should
differentiate between these two when reporting information upward.
--
Jeff Squyres
jsquyres_at_[hidden]
|