Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slowdown with infiniband and latest CentOS kernel
From: Noam Bernstein (noam.bernstein_at_[hidden])
Date: 2014-02-27 08:06:02

On Feb 27, 2014, at 2:36 AM, Patrick Begou <Patrick.Begou_at_[hidden]> wrote:

> Bernd Dammann wrote:
>> Using the workaround '--bind-to-core' does only make sense for those jobs, that allocate full nodes, but the majority of our jobs don't do that.
> Why ?
> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other applications to attach each process on its core because sometimes linux move processes and 2 process can run on the same core, slowing the application. Even if we do not use full nodes.
> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your threads will be binded to the same core but I do not remember that OpenFOAM does this yet.

But if your jobs don't allocate full nodes and there are two jobs on the same node
they can end up bound to the same subset of cores. Torque cpusets should in
principle be able to do this (queuing system allocates distinct sets of cores to
distinct jobs), but I've never used them myself.

Here we've just basically given up on jobs that allocate a non-integer # of
nodes. In principle they can (and then I turn off bind by core), but hardly anyone
does it except for some serial jobs. Then again, we have a mix of 8 and 16 core
nodes. If we had only 32 or 64 core nodes we might be less tolerant of this