Thank you very much for your explanation!
Ralph Castain wrote:
> It is a little bit of both:
> * historical, because most MPI's default to mapping by slot, and
> * performance, because procs that share a node can communicate via
> shared memory, which is faster than sending messages over an
> interconnect, and most apps are communication-bound
> If your app is disk-intensive, then mapping it -bynode may be a better
Ok -- by this, it seems that there is no "rule" that says one is obviously better than the other. It depends on factors such as disk access and shared memory access and which one is dominating. So, it is worth to try both to see?
> option for you. That's why we provide it. Note, however, that you can
> still wind up with multiple procs on a node. All "bynode" means is that
> the ranks are numbered consecutively bynode - it doesn't mean that there
> is only one proc/node.
I see. But if the number of processes (as specified using -np) is less than the number of nodes, if "by node" is chosen, then is it guaranteed that only one process will be on each node? Is there a way to write the hostfile to ensure this?
I was curious if a node has 4 slots, whether writing it 4 times in the hostfile with 1 slot each has any meaning. Might be a bad idea as we are trying to fool mpirun?
> If you truly want one proc/node, then you should use the -pernode
> option. This maps one proc on each node up to either the number of procs
> you specified or the number of available nodes. If you don't specify
> -np, we just put one proc on each node in your allocation/hostfile.
I see ... I was not aware of that option; thank you!