Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] wrong core binding by openmpi-1.5.5
From: Brice Goglin (Brice.Goglin_at_[hidden])
Date: 2012-04-12 01:46:33


Hello Tetsuya,
I think it's expected that the displayed cpusets are different.
I only converted the code that applies/retrieves the binding, I did not
touch the code that prints it.
Good to know it works.
Brice

Le 12/04/2012 05:24, tmishima_at_[hidden] a écrit :
> Hi, Brice.
>
> Thank you for sending me a patch. Now, I quickly tested your try2.patch.
>
> Then, regarding execution speed it works well.
> But, in terms of core binding reports, it's still different from
> openmpi-1.5.4.
> I'm not sure which is better for a standard user like me, reporting logical
> indexes or physical ones.
>
> patched openmpi-1.5.5 Reports:
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],1] to cpus 00f0
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],2] to cpus 0f00
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],3] to cpus f000
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],4] to cpus f0000
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],5] to cpus f00000
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],6] to cpus f000000
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],7] to cpus f0000000
> [node03.cluster:09780] [[43552,0],0] odls:default:fork binding child
> [[43552,1],0] to cpus 000f
>
> Regards,
> Tetsuya Mishima
>
>> Here's a better patch. Still only compile tested :)
>> Brice
>>
>>
>> Le 11/04/2012 10:36, Brice Goglin a écrit :
>>
>> A quick look at the code seems to confirm my feeling. get/set_module()
>> callbacks manipulate arrays of logical indexes, and they do not convert
>> them back to physical indexes before binding.
>>
>> Here's a quick patch that may help. Only compile tested...
>>
>> Brice
>>
>>
>>
>> Le 11/04/2012 09:49, Brice Goglin a écrit :
>>
>> Le 11/04/2012 09:06, tmishima_at_[hidden] a écrit :
>>
>> Hi, Brice.
>>
>> I installed the latest hwloc-1.4.1.
>> Here is the output of lstopo -p.
>>
>> [root_at_node03 bin]# ./lstopo -p
>> Machine (126GB)
>> Socket P#0 (32GB)
>> NUMANode P#0 (16GB) + L3 (5118KB)
>> L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>> L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>> L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>> L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
>>
>> Ok then the cpuset of this numanode is 1111.
>>
>> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
>> [[55518,1],0] to cpus 1111
>>
>> So openmpi 1.5.4 is correct.
>>
>> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
>> [[40566,1],0] to cpus 000f
>>
>> And openmpi 1.5.5 is indeed wrong.
>>
>> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
>> cpusets (used for binding) are internally made of hwloc *physical*
>> indexes (1111 here).
>>
>> Jeff, Ralph:
>> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
>> bitmap operations on hwloc object cpusets?
>> If yes, I don't know what's going wrong here.
>> If no, are you building hwloc cpusets manually by setting individual
>> bits from object indexes? If yes, you must use *physical* indexes to do
> so.
>> Brice
>>
>> _______________________________________________
>> users mailing
> listusers_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing
> listusers_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --- opal/mca/paffinity/hwloc/paffinity_hwloc_module.c.old 2012-04-11
> 10:19:36.766710073 +0200
>> +++ opal/mca/paffinity/hwloc/paffinity_hwloc_module.c 2012-04-11
> 11:13:52.930438083 +0200
>> @@ -164,9 +164,10 @@
>>
>> static int module_set(opal_paffinity_base_cpu_set_t mask)
>> {
>> - int i, ret = OPAL_SUCCESS;
>> + int ret = OPAL_SUCCESS;
>> hwloc_bitmap_t set;
>> hwloc_topology_t *t;
>> + hwloc_obj_t pu;
>>
>> /* bozo check */
>> if (NULL == opal_hwloc_topology) {
>> @@ -178,10 +179,11 @@
>> if (NULL == set) {
>> return OPAL_ERR_OUT_OF_RESOURCE;
>> }
>> - hwloc_bitmap_zero(set);
>> - for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_CPU_MAX; +
> +i) {
>> - if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
>> - hwloc_bitmap_set(set, i);
>> + for (pu = hwloc_get_obj_by_type(*t, HWLOC_OBJ_PU, 0);
>> + pu && pu->logical_index < OPAL_PAFFINITY_BITMASK_CPU_MAX;
>> + pu = pu->next_cousin) {
>> + if (OPAL_PAFFINITY_CPU_ISSET(pu->logical_index, mask)) {
>> + hwloc_bitmap_set(set, pu->os_index);
>> }
>> }
>>
>> @@ -196,9 +198,10 @@
>>
>> static int module_get(opal_paffinity_base_cpu_set_t *mask)
>> {
>> - int i, ret = OPAL_SUCCESS;
>> + int ret = OPAL_SUCCESS;
>> hwloc_bitmap_t set;
>> hwloc_topology_t *t;
>> + hwloc_obj_t pu;
>>
>> /* bozo check */
>> if (NULL == opal_hwloc_topology) {
>> @@ -218,9 +221,11 @@
>> ret = OPAL_ERR_IN_ERRNO;
>> } else {
>> OPAL_PAFFINITY_CPU_ZERO(*mask);
>> - for (i = 0; ((unsigned int) i) < 8 * sizeof(*mask); i++) {
>> - if (hwloc_bitmap_isset(set, i)) {
>> - OPAL_PAFFINITY_CPU_SET(i, *mask);
>> + for (pu = hwloc_get_obj_by_type(*t, HWLOC_OBJ_PU, 0);
>> + pu && pu->logical_index < 8 * sizeof(*mask);
>> + pu = pu->next_cousin) {
>> + if (hwloc_bitmap_isset(set, pu->os_index)) {
>> + OPAL_PAFFINITY_CPU_SET(pu->logical_index, *mask);
>> }
>> }
>> }
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users