It seems to be working fine for me:

[rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca rmaps_lama_bind 1c -mca rmaps lama hostname
bend001
[bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..][../../../../../..]
[bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../..][../../../../../..]
bend001
[rhc@bend001 tcp]$ 

(I also checked the internals using "-mca rmaps_base_verbose 10") so it could be your hier inversion causing problems again. Or it could be that you are hitting a connection issue we are seeing in some scenarios in the OOB subsystem - though if you are able to run using a non-lama mapper, that would seem unlikely.


On Dec 20, 2013, at 8:09 PM, tmishima@jcity.maeda.co.jp wrote:



Hi Ralph,

Thank you very much. I tried many things such as:

mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
rmaps_lama_bind 1c myprog

But every try failed. At least they were accepted by openmpi-1.7.3 as far
as I remember.
Anyway, please check it when you have a time, because using lama comes from
my curiosity.

Regards,
Tetsuya Mishima


I'll try to take a look at it - my expectation is that lama might get
stuck because you didn't tell it a pattern to map, and I doubt that code
path has seen much testing.


On Dec 20, 2013, at 5:52 PM, tmishima@jcity.maeda.co.jp wrote:



Hi Ralph, I'm glad to hear that, thanks.

By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
node.

Then, even wiht this simple command line, it freezed without any
massage:

mpirun -np 2 -host node05 -mca rmaps lama myprog

Could you check what happened?

Is it better to open new thread or continue this thread?

Regards,
Tetsuya Mishima


I'll make it work so that NUMA can be either above or below socket

On Dec 20, 2013, at 2:57 AM, tmishima@jcity.maeda.co.jp wrote:



Hi Brice,

Thank you for your comment. I understand what you mean.

My opinion was made just considering easy way to adjust the code for
inversion of hierarchy in object tree.

Tetsuya Mishima


I don't think there's any such difference.
Also, all these NUMA architectures are reported the same by hwloc,
and
therefore used the same in Open MPI.

And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
(and
most recent AMD and Intel platforms).

Brice



Le 20/12/2013 11:33, tmishima@jcity.maeda.co.jp a écrit :

Hi Ralph,

The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
coherent)NUMA,
which seems to be a little bit different from the traditional numa
defined
in openmpi.

I notice that ccNUMA object is almost same as L3cache object.
So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
to
do.
Therefore, "do not touch it" is one of the solution, I think ...

Anyway, mixing up these two types of numa is the problem.

Regards,
Tetsuya Mishima

I can wait it'll be fixed in 1.7.5 or later, because putting
"-bind-to
numa"
and "-map-by numa" at the same time works as a workaround.

Thanks,
Tetsuya Mishima

Yeah, it will impact everything that uses hwloc topology maps, I
fear.

One side note: you'll need to add --hetero-nodes to your cmd
line.
If
we
don't see that, we assume that all the node topologies are
identical
-
which clearly isn't true here.
I'll try to resolve the hier inversion over the holiday - won't
be
for
1.7.4, but hopefully for 1.7.5
Thanks
Ralph

On Dec 18, 2013, at 9:44 PM, tmishima@jcity.maeda.co.jp wrote:


I think it's normal for AMD opteron having 8/16 cores such as
magny cours or interlagos. Because it usually has 2 numa nodes
in a cpu(socket), numa-node can not include a socket. This type
of hierarchy would be natural.

(node03 is Dell PowerEdge R815 and maybe quite common, I guess)

By the way, I think this inversion should affect rmaps_lama
mapping.

Tetsuya Mishima

Ick - yeah, that would be a problem. I haven't seen that type
of
hierarchical inversion before - is node03 a different type of
chip?
Might take awhile for me to adjust the code to handle hier
inversion... :-(
On Dec 18, 2013, at 9:05 PM, tmishima@jcity.maeda.co.jp wrote:


Hi Ralph,

I found the reason. I attached the main part of output with 32
core node(node03) and 8 core node(node05) at the bottom.

From this information, socket of node03 includes numa-node.
On the other hand, numa-node of node05 includes socket.
The direction of object tree is opposite.

Since "-map-by socket" may be assumed as default,
for node05, "-bind-to numa and -map-by socket" means
upward search. For node03, this should be downward.

I guess that openmpi-1.7.4rc1 will always assume numa-node
includes socket. Is it right? Then, upward search is assumed
in orte_rmaps_base_compute_bindings even for node03 when I
put "-bind-to numa and -map-by socket" option.

[node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
[node03.cluster:15508] mca:rmaps: compute bindings for job
[38286,1]
with
policy NUMA
[node03.cluster:15508] mca:rmaps: bind upwards for job
[38286,1]
with
bindings NUMA
[node03.cluster:15508] [[38286,0],0] bind:upward target
NUMANode
type
Machine

That's the reason of this trouble. Therefore, adding "-map-by
core"
works.
(mapping pattern seems to be strange ...)

[mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by
core
-report-bindings myprog
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target
NUMANode
type
NUMANode
[node03.cluster:15885] MCW rank 2 bound to socket 0[core 0[hwt
0]],
socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:


[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 3 bound to socket 0[core 0[hwt
0]],
socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:


[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 4 bound to socket 0[core 4[hwt
0]],
socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:


[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 5 bound to socket 0[core 4[hwt
0]],
socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:


[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 6 bound to socket 0[core 4[hwt
0]],
socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:


[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 7 bound to socket 0[core 4[hwt
0]],
socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:


[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 0 bound to socket 0[core 0[hwt
0]],
socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:


[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 1 bound to socket 0[core 0[hwt
0]],
socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:


[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
Hello world from process 6 of 8
Hello world from process 5 of 8
Hello world from process 0 of 8
Hello world from process 7 of 8
Hello world from process 3 of 8
Hello world from process 4 of 8
Hello world from process 2 of 8
Hello world from process 1 of 8

Regards,
Tetsuya Mishima

[node03.cluster:15508] Type: Machine Number of child objects:
4
   Name=NULL
   total=132358820KB
   Backend=Linux
   OSName=Linux
   OSRelease=2.6.18-308.16.1.el5
   OSVersion="#1 SMP Tue Oct 2 22:01:43 EDT 2012"
   Architecture=x86_64
   Cpuset:  0xffffffff
   Online:  0xffffffff
   Allowed: 0xffffffff
   Bind CPU proc:   TRUE
   Bind CPU thread: TRUE
   Bind MEM proc:   FALSE
   Bind MEM thread: TRUE
   Type: Socket Number of child objects: 2
           Name=NULL
           total=33071780KB
           CPUModel="AMD Opteron(tm) Processor 6136"
           Cpuset:  0x000000ff
           Online:  0x000000ff
           Allowed: 0x000000ff
           Type: NUMANode Number of child objects: 1


[node05.cluster:21750] Type: Machine Number of child objects:
2
   Name=NULL
   total=33080072KB
   Backend=Linux
   OSName=Linux
   OSRelease=2.6.18-308.16.1.el5
   OSVersion="#1 SMP Tue Oct 2 22:01:43 EDT 2012"
   Architecture=x86_64
   Cpuset:  0x000000ff
   Online:  0x000000ff
   Allowed: 0x000000ff
   Bind CPU proc:   TRUE
   Bind CPU thread: TRUE
   Bind MEM proc:   FALSE
   Bind MEM thread: TRUE
   Type: NUMANode Number of child objects: 1
           Name=NULL
           local=16532232KB
           total=16532232KB
           Cpuset:  0x0000000f
           Online:  0x0000000f
           Allowed: 0x0000000f
           Type: Socket Number of child objects: 1


Hmm...try adding "-mca rmaps_base_verbose 10 -mca
ess_base_verbose
5"
to
your cmd line and let's see what it thinks it found.

On Dec 18, 2013, at 6:55 PM, tmishima@jcity.maeda.co.jp
wrote:


Hi, I report one more problem with openmpi-1.7.4rc1,
which is more serious.

For our 32 core nodes(AMD magny cours based) which has
8 numa-nodes, "-bind-to numa" does not work. Without
this option, it works. For your infomation, at the
bottom of this mail, I added the lstopo information
of the node.

Regards,
Tetsuya Mishima

[mishima@manage ~]$ qsub -I -l nodes=1:ppn=32
qsub: waiting for job 8352.manage.cluster to start
qsub: job 8352.manage.cluster ready

[mishima@node03 demos]$ mpirun -np 8 -report-bindings
-bind-to
numa
myprog
[node03.cluster:15316] [[37582,0],0] bind:upward target
NUMANode
type
Machine




--------------------------------------------------------------------------
A request was made to bind to NUMA, but an appropriate
target
could
not
be found on node node03.




--------------------------------------------------------------------------
[mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/
[mishima@node03 demos]$ mpirun -np 8 -report-bindings myprog
[node03.cluster:15282] MCW rank 2 bound to socket 1[core 8
[hwt
0]]:
[./././././././.][B/././././././.][./././././././.][
./././././././.]>>>>>>>>>>>> [node03.cluster:15282] MCW rank
3 bound to socket 1[core 9[hwt
0]]:
[./././././././.][./B/./././././.][./././././././.][
./././././././.]
[node03.cluster:15282] MCW rank 4 bound to socket 2[core 16
[hwt
0]]:
[./././././././.][./././././././.][B/././././././.]
[./././././././.]
[node03.cluster:15282] MCW rank 5 bound to socket 2[core 17
[hwt
0]]:
[./././././././.][./././././././.][./B/./././././.]
[./././././././.]
[node03.cluster:15282] MCW rank 6 bound to socket 3[core 24
[hwt
0]]:
[./././././././.][./././././././.][./././././././.]
[B/././././././.]
[node03.cluster:15282] MCW rank 7 bound to socket 3[core 25
[hwt
0]]:
[./././././././.][./././././././.][./././././././.]
[./B/./././././.]
[node03.cluster:15282] MCW rank 0 bound to socket 0[core 0
[hwt
0]]:
[B/././././././.][./././././././.][./././././././.][
./././././././.]
[node03.cluster:15282] MCW rank 1 bound to socket 0[core 1
[hwt
0]]:
[./B/./././././.][./././././././.][./././././././.][
./././././././.]
Hello world from process 2 of 8
Hello world from process 5 of 8
Hello world from process 4 of 8
Hello world from process 3 of 8>>>>>>>>>> Hello world from
process 1 of 8
Hello world from process 7 of 8
Hello world from process 6 of 8
Hello world from process 0 of 8
[mishima@node03 demos]$ ~/opt/hwloc/bin/lstopo-no-graphics
Machine (126GB)
Socket L#0 (32GB)
NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB)
L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0
+
PU
L#0
(P#0)
L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1
+
PU
L#1
(P#1)
L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2
+
PU
L#2
(P#2)
L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3
+
PU
L#3
(P#3)
NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB)
L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4
+
PU
L#4
(P#4)
L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5
+
PU
L#5
(P#5)
L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6
+
PU
L#6
(P#6)
L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7
+
PU>>>>>> L#7
(P#7)
Socket L#1 (32GB)
NUMANode L#2 (P#6 16GB) + L3 L#2 (5118KB)
L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8
+
PU
L#8
(P#8)
L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9
+
PU
L#9
(P#9)
L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core
L#10
+
PU
L#10 (P#10)
L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core
L#11
+
PU
L#11 (P#11)
NUMANode L#3 (P#7 16GB) + L3 L#3 (5118KB)
L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core
L#12
+
PU
L#12 (P#12)
L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core
L#13
+
PU
L#13 (P#13)
L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core
L#14
+
PU
L#14 (P#14)
L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core
L#15
+
PU
L#15 (P#15)
Socket L#2 (32GB)
NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB)
L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core
L#16
+
PU
L#16 (P#16)
L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core
L#17
+
PU
L#17 (P#17)> >>>>>    L2 L#18 (512KB) + L1d L#18 (64KB) +
L1i
L#18 (64KB) + Core L#18
+
PU
L#18 (P#18)
L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core
L#19
+
PU
L#19 (P#19)
NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB)
L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core
L#20
+
PU
L#20 (P#20)
L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core
L#21
+
PU
L#21 (P#21)
L2 L#22 (512KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core
L#22
+
PU
L#22 (P#22)
L2 L#23 (512KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core
L#23
+
PU
L#23 (P#23)
Socket L#3 (32GB)
NUMANode L#6 (P#2 16GB) + L3 L#6 (5118KB)
L2 L#24 (512KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core
L#24
+
PU
L#24 (P#24)>>>>>    L2 L#25 (512KB) + L1d L#25 (64KB) + L1i
L#25
(64KB) + Core L#25 +
PU
L#25 (P#25)
L2 L#26 (512KB) + L1d L#26 (64KB) + L1i L#26 (64KB) + Core
L#26
+
PU
L#26 (P#26)
L2 L#27 (512KB) + L1d L#27 (64KB) + L1i L#27 (64KB) + Core
L#27
+
PU
L#27 (P#27)
NUMANode L#7 (P#3 16GB) + L3 L#7 (5118KB)
L2 L#28 (512KB) + L1d L#28 (64KB) + L1i L#28 (64KB) + Core
L#28
+
PU
L#28 (P#28)
L2 L#29 (512KB) + L1d L#29 (64KB) + L1i L#29 (64KB) + Core
L#29
+
PU
L#29 (P#29)
L2 L#30 (512KB) + L1d L#30 (64KB) + L1i L#30 (64KB) + Core
L#30
+
PU
L#30 (P#30)
L2 L#31 (512KB) + L1d L#31 (64KB) + L1i L#31 (64KB) + Core
L#31
+
PU
L#31 (P#31)
HostBridge L#0
PCIBridge
PCI 14e4:1639
  Net L#0 "eth0"
PCI 14e4:1639
  Net L#1 "eth1"
PCIBridge
PCI 14e4:1639
  Net L#2 "eth2"
PCI 14e4:1639
  Net L#3 "eth3"
PCIBridge
PCIBridge
  PCIBridge
    PCI 1000:0072
      Block L#4 "sdb"
      Block L#5 "sda"
PCI 1002:4390
Block L#6 "sr0"
PCIBridge
PCI 102b:0532
HostBridge L#7
PCIBridge
PCI 15b3:6274
  Net L#7 "ib0"
  OpenFabrics L#8 "mthca0"

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users