On Feb 27, 2014, at 5:06 AM, Noam Bernstein <noam.bernstein_at_[hidden]> wrote:
> On Feb 27, 2014, at 2:36 AM, Patrick Begou <Patrick.Begou_at_[hidden]> wrote:
>> Bernd Dammann wrote:
>>> Using the workaround '--bind-to-core' does only make sense for those jobs, that allocate full nodes, but the majority of our jobs don't do that.
>> Why ?
>> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other applications to attach each process on its core because sometimes linux move processes and 2 process can run on the same core, slowing the application. Even if we do not use full nodes.
>> '--bind-to-core' is only not applicable if you mix OpenMP and MPI as all your threads will be binded to the same core but I do not remember that OpenFOAM does this yet.
> But if your jobs don't allocate full nodes and there are two jobs on the same node
> they can end up bound to the same subset of cores. Torque cpusets should in
> principle be able to do this (queuing system allocates distinct sets of cores to
> distinct jobs), but I've never used them myself.
> Here we've just basically given up on jobs that allocate a non-integer # of
> nodes. In principle they can (and then I turn off bind by core), but hardly anyone
> does it except for some serial jobs. Then again, we have a mix of 8 and 16 core
> nodes. If we had only 32 or 64 core nodes we might be less tolerant of this
I don't know if the original poster is using a resource manager or not, but we can support multi-tenant operations regardless. If you are using a resource manager, you can ask the RM to bind your allocation to a specific number of cores on each node. OMPI will then respect that restriction, binding your processes to cores within it.
If you aren't using a resource manager, or simply want to run multiple jobs on your own dedicated nodes, you can impose the restriction yourself by just adding the --cpu-set option to your cmd line:
mpirun --cpu-set 0-3 ...
will restrict OMPI to using the first four cores on each node. Any comma-delimited set of ranges can be provided.
Even more mapping and binding options are provided in the 1.7 series, so you might want to look at it.
> users mailing list