Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-09 06:20:41


By default, hwloc only shows what's inside the current cpuset. There's
an option to show everything instead (topology flag).

Brice

Le 09/02/2012 12:18, Jeff Squyres a écrit :
> Just so that I understand this better -- if a process is bound in a cpuset, will tools like hwloc's lstopo only show the Linux processors *in that cpuset*? I.e., does it not have any visibility of the processors outside of its cpuset?
>
>
> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>
>> Hi,
>>
>> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
>> is configured with:
>> TaskPlugin=task/affinity
>> TaskPluginParam=Cpusets
>>
>> each rank of that job is in a cpuset that contains a single CPU.
>>
>> Now, if we use carto on top of this, the following happens in
>> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
>> . opal_paffinity_base_get_processor_info() is called to get the
>> number of logical processors (we get 1 due to the singleton cpuset)
>> . we loop over that # of processors to check whether our process is
>> bound to one of them. In our case the loop will be executed only
>> once and we will never get the correct binding information.
>> . if the process is bound actually get the distance to the device.
>> in our case we won't execute that part of the code.
>>
>> The attached patch is a proposal to fix the issue.
>>
>> Regards,
>> Nadia
>> <get_ib_dev_distance.patch>_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>