Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-02-09 06:20:41


By default, hwloc only shows what's inside the current cpuset. There's
an option to show everything instead (topology flag).

Brice

Le 09/02/2012 12:18, Jeff Squyres a écrit :
> Just so that I understand this better -- if a process is bound in a cpuset, will tools like hwloc's lstopo only show the Linux processors *in that cpuset*? I.e., does it not have any visibility of the processors outside of its cpuset?
>
>
> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>
>> Hi,
>>
>> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
>> is configured with:
>> TaskPlugin=task/affinity
>> TaskPluginParam=Cpusets
>>
>> each rank of that job is in a cpuset that contains a single CPU.
>>
>> Now, if we use carto on top of this, the following happens in
>> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
>> . opal_paffinity_base_get_processor_info() is called to get the
>> number of logical processors (we get 1 due to the singleton cpuset)
>> . we loop over that # of processors to check whether our process is
>> bound to one of them. In our case the loop will be executed only
>> once and we will never get the correct binding information.
>> . if the process is bound actually get the distance to the device.
>> in our case we won't execute that part of the code.
>>
>> The attached patch is a proposal to fix the issue.
>>
>> Regards,
>> Nadia
>> <get_ib_dev_distance.patch>_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>