Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Heads up on new feature to 1.3.4
From: Chris Samuel (csamuel_at_[hidden])
Date: 2009-08-17 21:18:14


----- "Eugene Loh" <Eugene.Loh_at_[hidden]> wrote:

Hi Eugene,

[...]
> It would be even better to have binding selections adapt to other
> bindings on the system.

Indeed!

This touches on the earlier thread about making OMPI aware
of its cpuset/cgroup allocation on the node (for those sites
that are using it), it might solve this issue quite nicely as
OMPI would know precisely what cores & sockets were allocated
for its use without having to worry about other HPC processes.

No idea how to figure that out for processes outside of cpusets. :-(

> In any case, regardless of what the best behavior is, I appreciate
> the point about changing behavior in the middle of a stable release.

Not a problem, and I take Jeff's point about 1.3 not being a
super stable release and thus not being a blocker to changes
such as this.

> Arguably, leaving significant performance on the table in typical
> situations is a bug that warrants fixing even in the middle of a
> release, but I won't try to settle that debate here.

I agree for those cases where there's no downside, and thinking
further on your point of balancing between sockets I can see why
that would limit the impact.

Most of the cases I can think of that would be most adversely
affected are down to other jobs binding to cores naively and if
that's happening outside of cpusets then the cluster sysadmin
has more to worry about from mixing those applications than
mixing with OMPI ones which are just binding to sockets. :-)

So I'll happily withdraw my objection on those grounds.

*But* I would like to test this code out on a cluster with
cpuset support enabled to see whether it will behave itself.

Basically if I run a 4 core MPI job on a dual socket system
which has been allocated only the cores on socket 0 what will
happen when it tries to bind to socket 1 which is outside its
cpuset ?

Is there a 1.3 branch or tarball with these patches applied
that I could test out ?

cheers,
Chris

-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency