devel-bounces@open-mpi.org wrote on 02/09/2012 12:18:20
PM:
> De : Jeff Squyres <jsquyres@cisco.com>
> A : Open MPI Developers <devel@open-mpi.org>
> Date : 02/09/2012 12:18 PM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance
doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-bounces@open-mpi.org
>
> Just so that I understand this better -- if a process is bound in
a
> cpuset, will tools like hwloc's lstopo only show the Linux
> processors *in that cpuset*? I.e., does it not have any visibility
> of the processors outside of its cpuset?
Yes, looks like. At least this is what is returned
by opal_paffinity_base_get_processor_info().
Regards,
Nadia
>
>
> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>
> > Hi,
> >
> > If a job is launched using "srun --resv-ports --cpu_bind:..."
and slurm
> > is configured with:
> > TaskPlugin=task/affinity
> > TaskPluginParam=Cpusets
> >
> > each rank of that job is in a cpuset that contains a single CPU.
> >
> > Now, if we use carto on top of this, the following happens in
> > get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> > . opal_paffinity_base_get_processor_info() is called to
get the
> > number of logical processors (we get 1 due to the
singleton cpuset)
> > . we loop over that # of processors to check whether our
process is
> > bound to one of them. In our case the loop will
be executed only
> > once and we will never get the correct binding
information.
> > . if the process is bound actually get the distance to
the device.
> > in our case we won't execute that part of the code.
> >
> > The attached patch is a proposal to fix the issue.
> >
> > Regards,
> > Nadia
> > <get_ib_dev_distance.patch>_______________________________________________
> > devel mailing list
> > devel@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquyres@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel