Open MPI logo

Hardware Locality Users' Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Hardware Locality Users mailing list

Subject: Re: [hwloc-users] Problem getting cpuset of MPI task
From: Hendryk Bockelmann (bockelmann_at_[hidden])
Date: 2011-02-10 04:59:36


Hello Samuel,

thanx for the hint ... now I start my program with:

   hwloc_topology_init(&topology);
   hwloc_topology_set_flags(topology,HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM);
   hwloc_topology_load(topology);

and can access all information I need to rebind my MPI-tasks or to
rearrange the MPI communicators.

btw: are there any plans to fully support POWER6 and/or POWER7 running
AIX6.1 for the future? Actually we can get the topology right but cache
sizes are missing.

Hendryk

On 10/02/11 10:40, Samuel Thibault wrote:
> Hello,
>
> Hendryk Bockelmann, le Thu 10 Feb 2011 09:08:11 +0100, a écrit :
>> On our clusters the job scheduler binds the MPI tasks, but it is not
>> always clear to which resources. So for us it would be great to know
>> where a task runs such that we might adopt the MPI communicators to
>> increase performance.
>
> Ok, so get_cpubind should be enough to know what binding the job
> scheduler does.
>
>> Maybe just a note on the hwloc output on the cluster: while on my locale
>> machine all MPI tasks are able to explore the whole topology, on the
>> cluster each task only sees itself, e.g. for task 7:
>>
>> 7:Machine#0(Backend=AIXOSName=AIXOSRelease=1OSVersion=6HostName=p191Architecture=00C83AC24C00),
>> cpuset: 0x0000c000
>> 7: NUMANode#0, cpuset: 0x0000c000
>> 7: L2Cache#0(0KB line=0), cpuset: 0x0000c000
>> 7: Core#0, cpuset: 0x0000c000
>> 7: PU, cpuset: 0x00004000
>> 7: PU#0, cpuset: 0x00008000
>> 7:--> root_cpuset of process 7 is 0x0000c000
>
> Yes, because by default hwloc restricts itself to what you are allowed
> to use anyway. To see more, use --whole-system.
>
>> Nevertheless, all MPI-tasks have different cpusets and since the nodes
>> are homogeneous one can guess the whole binding using the information of
>> lstopo and the HostName of each task. Perhaps you can tell me whether
>> such a restricted topology is due to hwloc or due to the fixed binding
>> by the job scheduler?
>
> It's because by default hwloc follows the fixed binding :)
>
> Samuel