Just tested on a 96-core shared-memory machine. Running OpenMPI 1.6 mpiexec lstopo, here's the execution time (mpiexec launch time is 0.2-0.4s)
1 rank :  0.2s
8 ranks:  0.3-0.5s depending on binding (packed or scatter)
24ranks:  0.8-3.7s depending on binding
48ranks:  2.8-8.0s depending on binding
96ranks: 14.2s

96ranks from a single XML file: 0.4s (negligible against mpiexec launch time)


Le 05/03/2013 20:23, Simon Hammond a écrit :
Hi HWLOC users,

We are seeing some significant performance problems using HWLOC 1.6.2 on Intel's MIC products. In one of our configurations we create 56 MPI ranks, each rank then queries the topology of the MIC card before creating threads. We are noticing that if we run 56 MPI ranks as opposed to one the calls to query the topology in HWLOC are very slow, runtime goes from seconds to minutes (and upwards).

We guessed that this might be caused by the kernel serializing access to the /proc filesystem but this is just a hunch. 

Has anyone had this problem and found an easy way to change the library / calls to HWLOC so that the slow down is not experienced? Would you describe this as a bug?

Thanks for your help.

Simon Hammond

1-(505)-845-7897 / MS-1319
Scalable Computer Architectures
Sandia National Laboratories, NM

hwloc-users mailing list