Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Heads up on new feature to 1.3.4
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-17 11:31:44

I don't disagree with your statements. However, I was addressing the
specific question of two OpenMPI programs conflicting on process placement,
not the overall question you are raising.

The issue of when/if to bind has been debated for a long time. I agree that
having more options (bind-to-socket, bind-to-core, etc) makes sense and that
the choice of a default is difficult, for all the reasons that have been
raised in this thread.

At issue for us is that other MPIs -do- bind by default, thus creating an
apparent performance advantage for themselves compared to us on standard
benchmarks run "out-of-the-box". We repeatedly get beat-up in papers and
elsewhere over our performance, when many times the major difference is in
the default binding. If we bind the same way they do, then the performance
gap disappears or is minimal.

So this is why we are wrestling with this issue. I'm not sure of the best
compromise here, but I think people have raised good points on all sides.
Unfortunately, there problem isn't a perfect answer... :-/

Certainly, I have no clue what it would be! Not that smart :-)

On Mon, Aug 17, 2009 at 9:12 AM, N.M. Maclaren <nmm1_at_[hidden]> wrote:

> On Aug 17 2009, Ralph Castain wrote:
> The problem is that the two mpiruns don't know about each other, and
>> therefore the second mpirun doesn't know that another mpirun has already
>> used socket 0.
>> We hope to change that at some point in the future.
> It won't help. The problem is less likely to be that two jobs are running
> OpenMPI programs (that have been recently linked!), but that the other
> tasks
> are not OpenMPI at all. I have mentioned daemons, kernel threads and so
> on,
> but think of shared-memory parallel programs (OpenMP etc.) and so on; a LOT
> of applications nowadays include some sort of threading.
> For the ordinary multi-user system, you don't want any form of binding. The
> scheduler is ricketty enough as it is, without confusing it further. That
> may change as the consequences of serious levels of multiple cores force
> that area to be improved, but don't hold your breath. And I haven't a clue
> which of the many directions scheduler design will go!
> I agree that having an option, and having it easy to experiment with, is
> the
> right way to go. What the default should be is very much less clear.
> Regards,
> Nick Maclaren.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]