A. Austen wrote:
The problem doesn't even need to be embarrassingly parallel. Many MPI
applications depend on computational performance, which is often
sensitive to memory bandwidth. This factor can be more important to
application performance than interprocess communications.
On Fri, 28 Aug 2009 10:16 -0700, "Eugene Loh" <Eugene.Loh@Sun.COM>
Big topic and actually the subject of much recent discussion. Here are
a few comments:
1) "Optimally" depends on what you're doing. A big issue is making
sure each MPI process gets as much memory bandwidth (and cache and other
shared resources) as possible. This would argue that processes
*should* be spread over as many sockets as possible. And, indeed, some
MPIs default to this behavior. It depends on lots of things, including
how much of the machine you're using.
Yes, you're right. In my case, my processes within a single MPI job are
tightly coupled. These jobs are communication-intensive, and if I want
to use as many of the processors as possible, then minimizing the
cross-processor communication should yield the best overall throughput.
However, I see your point completely -- for an embarassingly parallel
problem, spreading the processes amongst the different sockets/memory
pools would probably give the best performance.
Yes. Or, pick up the latest/greatest changes in the trunk
(bind-by-core, etc.), but there still is no multi-job awareness.
2) Currently (1.3.2), there is rankfile support. This is probably a
little bit more gruesome than you hope for. E.g., if you have multiple
jobs, you need to custom tailor the rankfile for each.
So then it would seem like at least for now, I can get the behavior I
want by using rankfiles?
If you use rankfiles, each MPI job will try to bind per the rankfile
specified for it. So, if you're willing to construct a different
rankfile for each job, you'll be set with rankfiles.
Also, if I use the rankfile to distribute the processes, how about the
affinity issue? Can I still use affinity and expect that it will apply
to the topology specified in the rankfile, or will all the MPI jobs
always try to bind to the same processors in sequence?