Open MPI logo

Hardware Locality Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Development mailing list

Subject: Re: [hwloc-devel] ***UNCHECKED*** [WARNING: A/V UNSCANNABLE]Re: hwloc-distrib --among
From: Jirka Hladky (jhladky_at_[hidden])
Date: 2010-11-18 12:17:52


On Thursday, November 18, 2010 05:40:03 pm Samuel Thibault wrote:
> Jirka Hladky, le Thu 18 Nov 2010 15:14:07 +0100, a écrit :
> > thanks for looking into it! I'm using hwloc_distribute to distribute
> > parallel jobs on multi-socket systems.
> >
> > Usually, it gives nice results: running
> > hwloc-distrib --single <N>
> > on box with <N> sockets will ditrbitute one job per socket. This is what
> > I want.
> >
> > hwloc-distrib --single <2*N>
> > will distribute 2 jobs per socket, picking-up PU wisely.
> >
> > It breaks however on strange systems. Please check with
> > lstopo --input
> > or hwloc-distrib --input
> > on topology I sent you with my last e-mail (hp-dl980g7-01.tar.bz2, sent
> > on Tuesday 09:30:37 pm)
>
> Yes, use the --from socket of hwloc-distrib (previously called --among
> socket).
>
> > This is not working. So I have tried various --among and -ignore options
> > to achieve this but without success.
>
> --among socket is what should be working (renamed to --from after rc2),
> at least it does work for me:
>
> $ ./utils/hwloc-distrib --input /tmp/hp-dl980g7-01 --from socket 8
> 0x000000ff,,0x000000ff
> 0x00ff0000,,0x00ff0000
> 0xff000000,,0xff000000
> 0x000000ff,,0x000000ff,0x0
> 0x0000ff00,,0x0000ff00,0x0
> 0x00ff0000,,0x00ff0000,0x0
> 0xff000000,,0xff000000,0x0
> 0x0000ff00,,0x0000ff00
>
> Isn't this what you want? (with additional --single of course)
>
> Actually, I'm considering to just implement unbalanced distribution
> support for v1.1, it shouldn't be too hard.
>
> Samuel

Hi Samuel,

yes, I was wrong. I was playing with --among on another box. On the box I have
sent you the topology data I have tried --ignore option.

So this works for me:

$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --among socket 8)
0,16,24,32,40,48,56,8

I'm going to use --among in my scripts :-)

Thanks for looking into it!!

To summarize:
--among can be used to specify the highest level of hierarchy where to start
the distribution, right?

Possible inputs are:
machine (default), numa, socket, core, pu

Is this correct?

I do understand following output:
===================================================
$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single 8)
0,16,24,32,8,9,10,11

$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --among machine 8)
0,16,24,32,8,9,10,11

$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --among numa 8)
0,16,24,32,8,9,10,11

Starting from socket:
$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --among socket 8)
0,16,24,32,40,48,56,8

Starting from core:
$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --among core 8)
0,1,2,3,4,5,6,7

Starting from PU:
$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --among pu 8)
0,64,1,65,2,66,3,67
==================================================

I have tried {machine, numa, socket, core, pu} with --ignore but I'm not sure
if I understand the results.

==================================================
This is clear - balancing between NUMA nodes:

$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single 8)
0,16,24,32,8,9,10,11

This is not clear to me:
$hwloc-calc --input ../hp-dl980g7-01 --po --proclist $(hwloc-distrib --input
../hp-dl980g7-01 --single --ignore machine 8)
0,1,16,24,32,40,48,56

{socket,core, pu} are giving the same output as without --ignore
==================================================

Could you explain this to me as well?

Thanks a lot (and sorry for asking dummy questions)
Jirka