I don't think anybody every benchmarked this, but people have been
complaining this problem appearing on large machines at some point. I
have a large SGI machine at work, I'll see if I can reproduce this.
One solution is to export the topology to XML once and then have all
your MPI process read from XML. Basically, do "lstopo /tmp/foo.xml" and
then export HWLOC_XMLFILE=/tmp/foo.xml in the environment before
starting your MPI job.
If the topology doesn't change (and that's likely the case), the XML
file could even be stored by the administrator in a "standard" location
(not in /tmp)
Le 05/03/2013 20:23, Simon Hammond a écrit :
> Hi HWLOC users,
> We are seeing some significant performance problems using HWLOC 1.6.2
> on Intel's MIC products. In one of our configurations we create 56 MPI
> ranks, each rank then queries the topology of the MIC card before
> creating threads. We are noticing that if we run 56 MPI ranks as
> opposed to one the calls to query the topology in HWLOC are very slow,
> runtime goes from seconds to minutes (and upwards).
> We guessed that this might be caused by the kernel serializing access
> to the /proc filesystem but this is just a hunch.
> Has anyone had this problem and found an easy way to change the
> library / calls to HWLOC so that the slow down is not experienced?
> Would you describe this as a bug?
> Thanks for your help.
> Simon Hammond
> 1-(505)-845-7897 / MS-1319
> Scalable Computer Architectures
> Sandia National Laboratories, NM
> hwloc-users mailing list