Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slowdown with infiniband and latest CentOS kernel
From: Bernd Dammann (bd_at_[hidden])
Date: 2014-02-28 02:55:45

On 2/27/14 14:06 PM, Noam Bernstein wrote:
> On Feb 27, 2014, at 2:36 AM, Patrick Begou <Patrick.Begou_at_[hidden]> wrote:
>> Bernd Dammann wrote:
>>> Using the workaround '--bind-to-core' does only make sense for those jobs, that allocate full nodes, but the majority of our jobs don't do that.
>> Why ?
>> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other applications to attach each process on its core because sometimes linux move processes and 2 process can run on the same core, slowing the application. Even if we do not use full nodes.
>> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your threads will be binded to the same core but I do not remember that OpenFOAM does this yet.
> But if your jobs don't allocate full nodes and there are two jobs on the same node
> they can end up bound to the same subset of cores.

Exactly, that's our problem!

> Torque cpusets should in
> principle be able to do this (queuing system allocates distinct sets of cores to
> distinct jobs), but I've never used them myself.

We started to use them at some point, but it had some side effects
(leaving dangling jobs/processes), so we stopped using them. And
certain ISV applications has issues as well.

> Here we've just basically given up on jobs that allocate a non-integer # of
> nodes. In principle they can (and then I turn off bind by core), but hardly anyone
> does it except for some serial jobs. Then again, we have a mix of 8 and 16 core
> nodes. If we had only 32 or 64 core nodes we might be less tolerant of this
> restriction.

We are running a system with a very inhomogeneous workload, i.e.
in-house applications, or applications which we compile ourselves, but
also 3rd party applications, that not always are designed with a
(multi-user) cluster in mind.