----- "Ralph Castain" <rhc_at_[hidden]> wrote:
> Interesting. No, we don't take PLPA cpu sets into account when
> retrieving the allocation.
> Just to be clear: from an OMPI perspective, I don't think this is an
> issue of binding, but rather an issue of allocation. If we knew we had
> been allocated only a certain number of cores on a node, then we would
> only map that many procs to the node. When we subsequently "bind", we
> should then bind those procs to the correct cores (I think).
Hmm, OpenMPI should already know this from the PBS TM API when
launching the job, we've never had to get our users to specify
how many procs per node to start (and they will generally have
no idea how many to ask for in advance as they are at the mercy
of the scheduler, unless they select a whole nodes with ppn=8).
> Could you check this? You can run a trivial job using the
> -npernode x option, where x matched the #cores you were
> allocated on the nodes.
> If you do this, do we bind to the correct cores?
I'll give this a shot tomorrow when I'm back in the office
(just checking email late at night here), I'll try it under
strace to to see what it tries to sched_setaffinity() to.
> If we do, then that would confirm that we just aren't
> picking up the right number of cores allocated to us.
> If it is wrong, then this is a PLPA issue where it
> isn't binding to the right core.
Interesting, will let you know the test results tomorrow!
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency