Paul Kapinos <kapinos_at_[hidden]> writes:
> Jeff, I would turn the question the other way around:
> - are there any penalties when using KNEM?
Bull should be able to comment on that -- they turn it on by default in
their proprietary OMPI derivative -- but I doubt I can get much of a
story on it. Mellanox ship it now too, but I don't know if their
distribution defaults to using it.
I expect to use knem on hardware that's essentially the same as Mark's.
If any issues appear in production, I'll be surprised and will report
> We have a couple of Really Big Nodes (128 cores) with non-huge memory
> bandwidth (because coupled of 4x standalone nodes with 4 sockets
I was hoping to have some results for just such a setup, but haven't
been able to spend any time on it this week. If there are any
suggestions for OMPI tuning on it I'd be interested.
> So cutting the bandwidth in halves on these nodes sound like
> Very Good Thing.
> But otherwise we've 1500+ nodes with 2 sockets and 24GB memory only
> and we do not wanna to disturb the production on these nodes.... (and
> different MPI versions for different nodes are doofy).
Why would you need that? Our horribly heterogeneous cluster just has a
node group-specific openmpi-mca-params.conf, and SGE parallel
environments keep jobs in specific host groups with basically the same
CPU speed and interconnect.