Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] 1.3.1 and 1.4rc1
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2011-12-13 12:57:49


Le 13/12/2011 18:17, Samuel Thibault a écrit :
> Brice Goglin, le Tue 13 Dec 2011 14:10:17 +0100, a écrit :
>> My main problem is that it's hard to know whether this will look good in
>> two years when we'll have support for AMD APU, Intel MIC and other
>> "strange" architectures. Which types should be common to CPUs and these
>> accelerators? Might be easy to answer for MIC,
> And still. MIC cores are not something you can just bind to.

Yeah it's more like another host. But once on the other host, you can
bind there.

>> Also I don't think the GPU caches should be L2 because they are not very
>> similar to the CPU ones.
> How so?

In the same way the GPU memory is different from the NUMA node memory?
Why would caches and cores be similar for CPU and GPU while memory and
pu would not?

>> Given the libnuma or libpci mess, there's no way I can think that
>> always keeping CUDA enabled will work fine in most cases.
> What do you think can go wrong?

Ugly API changes (like adding PCI info between 4.0rc2 and 4.0), broken
configure detection because the install path isn't very "standard" and
there's no pkg-config, ... Things that looked OK first but ended up
breaking with libnuma (broken backward compat) and libpci (pkg-config
added late, unclear dependencies on libz and libresolv, ...). With
--disable-cuda, there's an easy workaround so that's OK now.

> Quoting the documentation: “There is no explicit initialization
> function for the runtime; it initializes the first time a runtime
> function is called”.

Good!

>> * About the "tight" attribute, can't you just make a special case when
>> you're inside a GPU?
> I don't like such kind of special-casing: in the future we could very
> well also have a full-fledged core alongside an MP on the GPU.

Wait, I just saw in the code that it's only a *group* attribute! I
thought it was in the generic hwloc_obj structure... Ok then...

>> * About decoration, the lstopo output is totally unreadable on machine
>> with several "big" GPUs. I wonder if we actually need to display all GPU
>> threads like this or just say "16 SM with 32 threads each" instead?
> Well, we don't do such summary for very big machines like our 96core
> machine either...

PU/Core numbers are meaningful there, because we bind things. GPU
threads look anonymous to me, at least in the current lstopo output.

I am just saying that the current lstopo output on hosts with GPUs is
unreadable just because we show many "empty" boxes. If people want/can
manipulate independent GPU threads, I would understand.

Brice