Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] knem/openmpi performance?
From: Elken, Tom (tom.elken_at_[hidden])
Date: 2013-07-15 13:31:39

> I was hoping that someone might have some examples of real application
> behaviour rather than micro benchmarks. It can be crazy hard to get that
> information from users.
I don't have direct performance information on knem, but with Intel's (formerly QLogic's) PSM layer as delivered in our software stack (Intel True Scale Fabric Suite) known as IFS, there is a kcopy module that assists shared memory MPI bandwidth in a way similar to knem.

We ran SPEC MPI2007 benchmarks quite a while ago and kcopy showed about a 2% advantage on average over the 13 applications that make up the suite. -- There were codes which did not benefit, but no downside. This was run over 16 nodes at 8 cores per node, so not very fat nodes.

More interestingly, on one of our software revs. a few years ago, a bug crept in which disabled kcopy. A customer filed an issue that one of their apps slowed down by 30%. Fixing that bug restored the previous performance. The application was proprietary, so I don't even know what it did in general. It was run over multiple nodes, so this was not a single-node performance comparison.

More recently, some customers with large memory nodes, and > 40 cores per node found kcopy was important to the performance of their most important app, a finite element code (I don't have a percentage figure).

kcopy works with Open MPI over PSM , so using knem instead of kcopy is not likely to speed up that configuration much (unless you get your PSM from OFED or a Linux distro, then that won't include kcopy; we weren't able to get kcopy accepted upstream). Recent PSM (from OFED 3.5 say) can be built to use knem for kernel-assisted copies. kcopy also works with the other MPIs that support PSM.

Hope these anecdotes are relevant to Open MPI users considering knem.

-Tom Elken

> Unusually for us, we're putting in a second cluster with the same
> architecture, CPUs, memory and OS as the last one. I might be able to use
> this as a bigger stick to get some better feedback. If so, I'll pass it
> on.
> > Darius Buntinas, Brice Goglin, et al. wrote an excellent paper about
> > exactly this set of issues; see
> ...
> I'll definitely take a look - thanks again.
> All the best,
> Mark
> --
> -----------------------------------------------------------------
> Mark Dixon Email : m.c.dixon_at_[hidden]
> HPC/Grid Systems Support Tel (int): 35429
> Information Systems Services Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -----------------------------------------------------------------
> _______________________________________________
> users mailing list
> users_at_[hidden]