Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slowdown with infiniband and latest CentOS kernel
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-27 10:22:56

On Feb 27, 2014, at 5:06 AM, Noam Bernstein <noam.bernstein_at_[hidden]> wrote:

> On Feb 27, 2014, at 2:36 AM, Patrick Begou <Patrick.Begou_at_[hidden]> wrote:
>> Bernd Dammann wrote:
>>> Using the workaround '--bind-to-core' does only make sense for those jobs, that allocate full nodes, but the majority of our jobs don't do that.
>> Why ?
>> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other applications to attach each process on its core because sometimes linux move processes and 2 process can run on the same core, slowing the application. Even if we do not use full nodes.
>> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your threads will be binded to the same core but I do not remember that OpenFOAM does this yet.
> But if your jobs don't allocate full nodes and there are two jobs on the same node
> they can end up bound to the same subset of cores. Torque cpusets should in
> principle be able to do this (queuing system allocates distinct sets of cores to
> distinct jobs), but I've never used them myself.
> Here we've just basically given up on jobs that allocate a non-integer # of
> nodes. In principle they can (and then I turn off bind by core), but hardly anyone
> does it except for some serial jobs. Then again, we have a mix of 8 and 16 core
> nodes. If we had only 32 or 64 core nodes we might be less tolerant of this
> restriction.

I don't know if the original poster is using a resource manager or not, but we can support multi-tenant operations regardless. If you are using a resource manager, you can ask the RM to bind your allocation to a specific number of cores on each node. OMPI will then respect that restriction, binding your processes to cores within it.

If you aren't using a resource manager, or simply want to run multiple jobs on your own dedicated nodes, you can impose the restriction yourself by just adding the --cpu-set option to your cmd line:

mpirun --cpu-set 0-3 ...

will restrict OMPI to using the first four cores on each node. Any comma-delimited set of ranges can be provided.

Even more mapping and binding options are provided in the 1.7 series, so you might want to look at it.


> Noam
> _______________________________________________
> users mailing list
> users_at_[hidden]