Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bindings not detected with slurm (srun)
From: pascal.deveze_at_[hidden]
Date: 2011-08-18 04:25:47


Hi all,

When slurm is configured with the following parameters
   TaskPlugin=task/affinity
   TaskPluginParam=Cpusets
srun binds the processes by placing them into different
cpusets, each containing a single core.

e.g. "srun -N 2 -n 4" will create 2 cpusets in each of the two allocated
nodes and place the four ranks there, each single rank with a singleton as
a cpu constraint.

The issue in that case is in the macro OPAL_PAFFINITY_PROCESS_IS_BOUND (in
opal/mca/paffinity/paffinity.h):
  . opal_paffinity_base_get_processor_info() fills in num_processors with 1
(this is the size of each cpu_set)
  . num_bound is set to 1 too
and this implies *bound=false

So, the binding is correctly done by slurm and not detected by MPI.

To support the cpuset binding done by slurm, I propose the following patch:

hg diff opal/mca/paffinity/paffinity.h
diff -r 4d8c8a39b06f opal/mca/paffinity/paffinity.h
--- a/opal/mca/paffinity/paffinity.h Thu Apr 21 17:38:00 2011 +0200
+++ b/opal/mca/paffinity/paffinity.h Tue Jul 12 15:44:59 2011 +0200
@@ -218,7 +218,8 @@
                     num_bound++; \
                 } \
             } \
- if (0 < num_bound && num_bound < num_processors) { \
+ if (0 < num_bound && ((num_processors == 1) || \
+ (num_bound < num_processors))) { \
                 *(bound) = true; \
             } \
         } \