Just tested on a 96-core shared-memory machine. Running OpenMPI 1.6
mpiexec lstopo, here's the execution time (mpiexec launch time is 0.2-0.4s)
1 rank : 0.2s
8 ranks: 0.3-0.5s depending on binding (packed or scatter)
24ranks: 0.8-3.7s depending on binding
48ranks: 2.8-8.0s depending on binding
96ranks from a single XML file: 0.4s (negligible against mpiexec launch time)
Le 05/03/2013 20:23, Simon Hammond a écrit :
> Hi HWLOC users,
> We are seeing some significant performance problems using HWLOC 1.6.2
> on Intel's MIC products. In one of our configurations we create 56 MPI
> ranks, each rank then queries the topology of the MIC card before
> creating threads. We are noticing that if we run 56 MPI ranks as
> opposed to one the calls to query the topology in HWLOC are very slow,
> runtime goes from seconds to minutes (and upwards).
> We guessed that this might be caused by the kernel serializing access
> to the /proc filesystem but this is just a hunch.
> Has anyone had this problem and found an easy way to change the
> library / calls to HWLOC so that the slow down is not experienced?
> Would you describe this as a bug?
> Thanks for your help.
> Simon Hammond
> 1-(505)-845-7897 / MS-1319
> Scalable Computer Architectures
> Sandia National Laboratories, NM
> hwloc-users mailing list