Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bindings not detected with slurm (srun)
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-18 08:41:25


Afraid I am confused. I assume this refers to the trunk, yes?

I also assume you are talking about launching an application directly from srun as opposed to using mpirun - yes?

In that case, I fail to understand what difference it makes regarding this proposed change. The application process is being directly bound by slurm, so what paffinity thinks is irrelevant, except perhaps for some debugging I suppose. Is that what you are concerned about?

I'd just like to know what problem is actually being solved here. I agree that, if there is only one processor in a system, you are effectively "bound".

On Aug 18, 2011, at 2:25 AM, pascal.deveze_at_[hidden] wrote:

> Hi all,
>
> When slurm is configured with the following parameters
> TaskPlugin=task/affinity
> TaskPluginParam=Cpusets
> srun binds the processes by placing them into different
> cpusets, each containing a single core.
>
> e.g. "srun -N 2 -n 4" will create 2 cpusets in each of the two allocated
> nodes and place the four ranks there, each single rank with a singleton as
> a cpu constraint.
>
> The issue in that case is in the macro OPAL_PAFFINITY_PROCESS_IS_BOUND (in
> opal/mca/paffinity/paffinity.h):
> . opal_paffinity_base_get_processor_info() fills in num_processors with 1
> (this is the size of each cpu_set)
> . num_bound is set to 1 too
> and this implies *bound=false
>
> So, the binding is correctly done by slurm and not detected by MPI.
>
> To support the cpuset binding done by slurm, I propose the following patch:
>
> hg diff opal/mca/paffinity/paffinity.h
> diff -r 4d8c8a39b06f opal/mca/paffinity/paffinity.h
> --- a/opal/mca/paffinity/paffinity.h Thu Apr 21 17:38:00 2011 +0200
> +++ b/opal/mca/paffinity/paffinity.h Tue Jul 12 15:44:59 2011 +0200
> @@ -218,7 +218,8 @@
> num_bound++; \
> } \
> } \
> - if (0 < num_bound && num_bound < num_processors) { \
> + if (0 < num_bound && ((num_processors == 1) || \
> + (num_bound < num_processors))) { \
> *(bound) = true; \
> } \
> } \
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users