Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] howloc with scalemp
From: Brock Palen (brockp_at_[hidden])
Date: 2010-04-07 16:46:53


Brice Goglin wrote:

> Brock Palen wrote:
>> has anyone done work with hwloc on scalemp systems? They provide
>> their own tool numabind, but we are looking for a more generic
>> solution to process placement and control that works well inside our
>> MPI library (openMPI in most cases).
>>
>> Any input on this would be great!
>
> Hello Brock,
>
>> From what I remember, ScaleMP uses an hypervisor on each node that
> virtually merges all of them into a fake big shared-memory machine.
> Then
> a vanilla Linux kernel runs on top of it. So hwloc should just see
> regular cores and NUMA node information, assuming the virtual "merged"
> hardware reports all necessary information to the OS.
>

running lstopo 0.9.3 it appears that howloc does see the extra layer
of complexity:

[brockp_at_nyx0809 INTEL]$ lstopo -
System(79GB)
   Misc0
     Node#0(10GB) + Socket#1 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#0
       L2(256KB) + L1(32KB) + Core#1 + P#1
       L2(256KB) + L1(32KB) + Core#2 + P#2
       L2(256KB) + L1(32KB) + Core#3 + P#3
     Node#1(10GB) + Socket#0 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#4
       L2(256KB) + L1(32KB) + Core#1 + P#5
       L2(256KB) + L1(32KB) + Core#2 + P#6
       L2(256KB) + L1(32KB) + Core#3 + P#7
   Misc0
     Node#2(10GB) + Socket#3 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#8
       L2(256KB) + L1(32KB) + Core#1 + P#9
       L2(256KB) + L1(32KB) + Core#2 + P#10
       L2(256KB) + L1(32KB) + Core#3 + P#11
     Node#3(10GB) + Socket#2 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#12
       L2(256KB) + L1(32KB) + Core#1 + P#13
       L2(256KB) + L1(32KB) + Core#2 + P#14
       L2(256KB) + L1(32KB) + Core#3 + P#15
   Misc0
     Node#4(10GB) + Socket#5 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#16
       L2(256KB) + L1(32KB) + Core#1 + P#17
       L2(256KB) + L1(32KB) + Core#2 + P#18
       L2(256KB) + L1(32KB) + Core#3 + P#19
     Node#5(10GB) + Socket#4 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#20
       L2(256KB) + L1(32KB) + Core#1 + P#21
       L2(256KB) + L1(32KB) + Core#2 + P#22
       L2(256KB) + L1(32KB) + Core#3 + P#23
   Misc0
     Node#6(10GB) + Socket#7 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#24
       L2(256KB) + L1(32KB) + Core#1 + P#25
       L2(256KB) + L1(32KB) + Core#2 + P#26
       L2(256KB) + L1(32KB) + Core#3 + P#27
     Node#7(10GB) + Socket#6 + L3(8192KB)
       L2(256KB) + L1(32KB) + Core#0 + P#28
       L2(256KB) + L1(32KB) + Core#1 + P#29
       L2(256KB) + L1(32KB) + Core#2 + P#30
       L2(256KB) + L1(32KB) + Core#3 + P#31

I don't know why they are all labeled Misc0 but it does see the extra
layer.

If you want other information let me know.

> There's a bit of ScaleMP code in the Linux kernel, but it does pretty
> much nothing, it does not seem to add anything to /proc or /sys for
> instance. So I am not sure hwloc could get some specialized
> knowledge of
> ScaleMP machines. Maybe their custom numabind tool knows that ScaleMP
> machines only works on machines with some well-defined
> types/counts/numbering of processors and NUMA nodes, and thus uses
> this
> information to group sockets/NUMA-nodes depending on their physical
> distance.
>
> Brice
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>
>