Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-07-15 09:18:03


Hmmm...I believe I made a mis-statement. Shocking to those who know me, I am
sure! :-)

Just to correct my comments: OMPI knows how many "slots" have been allocated
to us, but not which "cores". So I'll assign the correct number of procs to
each node, but they won't know that we were allocated cores 2 and 4 (for
example), as opposed to some other combination.

When we subsequently bind, we bind to logical cpus based on our node rank -
i.e., what rank I am relative to my local peers on this node. PLPA then
translates that into a physical core.

My guess is that you are correct and PLPA isn't looking to see specifically
-which- cores were allocated to the job, but instead is simply translating
logical cpu=0 to the first physical core in the node.

The test I asked you to run, though, will confirm this. Please do let us
know as this is definitely something we should fix.

Thanks!
Ralph

On Wed, Jul 15, 2009 at 6:11 AM, Chris Samuel <csamuel_at_[hidden]> wrote:

>
> ----- "Ralph Castain" <rhc_at_[hidden]> wrote:
>
> Hi Ralph,
>
> > Interesting. No, we don't take PLPA cpu sets into account when
> > retrieving the allocation.
>
> Understood.
>
> > Just to be clear: from an OMPI perspective, I don't think this is an
> > issue of binding, but rather an issue of allocation. If we knew we had
> > been allocated only a certain number of cores on a node, then we would
> > only map that many procs to the node. When we subsequently "bind", we
> > should then bind those procs to the correct cores (I think).
>
> Hmm, OpenMPI should already know this from the PBS TM API when
> launching the job, we've never had to get our users to specify
> how many procs per node to start (and they will generally have
> no idea how many to ask for in advance as they are at the mercy
> of the scheduler, unless they select a whole nodes with ppn=8).

>
> > Could you check this? You can run a trivial job using the
> > -npernode x option, where x matched the #cores you were
> > allocated on the nodes.
> >
> > If you do this, do we bind to the correct cores?
>
> I'll give this a shot tomorrow when I'm back in the office
> (just checking email late at night here), I'll try it under
> strace to to see what it tries to sched_setaffinity() to.
>
> > If we do, then that would confirm that we just aren't
> > picking up the right number of cores allocated to us.
> > If it is wrong, then this is a PLPA issue where it
> > isn't binding to the right core.
>
> Interesting, will let you know the test results tomorrow!
>
> cheers,
> Chris
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager
> The Victorian Partnership for Advanced Computing
> P.O. Box 201, Carlton South, VIC 3053, Australia
> VPAC is a not-for-profit Registered Research Agency
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>