Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mixing Linux's CPU-shielding with mpirun's bind-to-core
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2013-08-18 03:36:10


Le 18/08/2013 05:34, Siddhartha Jana a écrit :
> Hi,
>
> My requirement:
> 1. Avoid the OS from scheduling tasks on cores 0-7 allocated to my
> process.
> 2. Avoid rescheduling of processes to other cores.
>
> My solution: I use Linux's CPU-shielding.
> [ Man page:
> http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html
> ]
> I create a cpuset called "socket1" with cores 8-15 in the dev fs. I
> iterate through all the tasks in /dev/cpuset/tasks and copy them to
> /dev/cpuset/socket1/tasks

Hello,

Most of these existing tasks are system tasks. Some actually *want* to
run on specific cores outside of socket1. For instance some kernel
threads are doing the scheduler load balancing on each core. Others are
doing defered work in the kernel that your application may need. I
wonder what happens when you move them. The kernel may reject your
request, or it may actually break things.

Also most of these tasks do nothing but sleeping 99.9% of the times
anyway. If you're worried about having too many system tasks on your
applications' core, just make sure you don't install useless packages
(or disable some services at startup).

If you *really* want to have 100% CPU for your application on cores 0-7,
be aware that other things such as interrupts will be stealing some CPU
cycles anyway. You could move these to cores 8-15 as well, but that
seems overkill to me.

> I create a cpuset called "socket0" with cores 0-7 .
> At the start of the application, (before MPI_Init()), I schedule my
> MPI process on the cpuset as follows:
> ------------------------------------------------------
> sprintf(str,"/bin/echo %d >> /dev/cpuset/socket0/tasks ",mypid);
> system(str);
> ------------------------------------------------------
> In order to ensure that my processes remain bound to the cores, I am
> passing the --bind-to-core option to mpirun. I do this, instead of
> using sched_setaffinity from within the application. Is there a chance
> that mpirun's "binding-to-core" will clash with the above ?

Make sure you also specified the NUMA node in your cpuset "mems" file
too. That's required before the cpuset can be used (otherwise adding a
task will fail). And make sure that the application can add itself to
the cpuset, usually only root can add tasks to cpusets.

And you may want to open/write/close on /dev/cpuset/socket0/tasks and
check the return values instead of this system() call.

If all the above works and does not return errors (you should check that
your application's PID is in /dev/cpuset/socket0/tasks while running),
bind-to-core won't clash with it, at least when using a OMPI that uses
hwloc for binding (v1.5.2 or later if I remember correctly).

> While this solution seems to work temporarily, I am not sure whether
> this is good solution.

Usually the administrator or PBS/Torque/... creates the cpuset and
places tasks in there for you.

Brice