Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] specifying I/O devices on the command-line
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-04-12 09:26:09

Le 12/04/2011 15:14, Jeff Squyres a écrit :
> On Apr 12, 2011, at 8:10 AM, Brice Goglin wrote:
>> I am looking for a good way to specify PCI and OS devices on the
>> command-line (for hwloc-calc and hwloc-bind).
>> The trunk currently supports:
>> * os:foobar with for OS device named foobar (eth0, mlx4_0, ...)
>> * pci:0000:00:00.0 or pci:00:00.0 for a given PCI device
>> * pci:aaaa:bbbb:c for the c-th PCI device with vendor ID aaaa and device
>> ID bbbb
>> The idea is basically to make it easy to bind processes near some
>> high-performance devices:
>> hwloc-bind os:mlx4_0 <mympibenchmark>
>> hwloc-bind pci:nvidia:tesla:0 <mycudabenchmark>
> Nifty.
> Can you list multiple devices? E.g.:
> hwloc-bind os:mlx4_0 os:mlx4_1 my_mpi_benchmark

Yes, that works. We're just extended the way we parse a single
"location" on the command line. All existing operations on these
locations (add, substract, xor, negate) still work.

> Also, is there a CLI way to retrieve which numa nodes / OS processors are near such devices? I can imagine wanting to script up something like:
> - retrieve a mask / list of processors near OS device <foo>
> - binding N processes, one per processor, to the processors near that device

Once you have a way to specify some I/O device, you can convert them
into whatever hwloc-calc can do. For instance:
    hwloc-calc os:mlx4_0 --pulist --po
gives the comma-separated list of physical indexes of PU near mlx4_0

By the way, for this exact case, we should actually support:
     hwloc-distribute <N> --restrict $(hwloc-calc os:mlx4_0)
I'll look at this.

>> Ideally, the os:foobar notation would be enough. But as long as we don't
>> have any OS name associated with (proprietary) GPUs, people will have to
>> identify GPUs by their PCI ids.
>> Other ideas that we may want so support:
>> * PCI devices by name: something like the 2nd PCI device whose name
>> contains "tesla C2070" so that people don't have to dig into lspci
>> manually to find out the vendor/device IDs or busids (mostly useful for
>> GPUs that have no OS names)
> I immediately had that question when I read your 2nd example, above (i.e., where did you get the names from?). Are these names in the lstopo output?

PCI names are only in the verbose output (they are usually very long).
OS names are always shown.

>> * OS devices by class: something like os:net:2 for the 2nd network
>> interface (not sure it's useful)
> I'm not sure it is -- isn't the ordering of PCI devices non-deterministic between cold boots?

As long as you don't plug/unplug anything in between, it should be ok,
but I can't be strictly sure about this.

The ordering won't change, but the OS names may still change because of

>> I/O devices will not be supported through the generic hierarchical
>> notation "socket:1.core:2..." anyway. So we could make their
>> command-line specification totally different from the usual one.
>> It's actually the first time we select objects on something different
>> than just a type or a depth and some indexes. So we could introduce a
>> new syntax here. For instance:
>> <type>[attributename=attributevalue,...]:index
>> <type>[attributename=attributevalue,...]:firstindex:lastindex
>> <type>[attributename=attributevalue,...]:firstindex:amount
>> Not sure it's worth doing this.
> It might be better to just put out basic functionality in 1.3 and *not* do advanced syntax like this (i.e., only do basic syntax). And then see what people ask for.

Then we need to define what "basic syntax" means :)