Ralph brings up some good points here. I have a few thoughts/experiences.
First, I like the way things are behaving now. In fact, I take full
of the fact the different aliases for a node are treated as different nodes
to do some scalability testing. It is in this way that I fake out the
have it start multiple daemons on a node. (We had a similar feature in our
old ClusterTools runtime environment to get multiple daemons running
on a single node)
For example, I do this to get 4 orteds running on "alachua".
mpirun -np 4 -host alachua,alachua-1,alachua-2,alachua-3 hostname
All of the above resolve to the same IP address.
Secondly, I would not want us to make any change that negatively affects
scalability. If we do decide to make a change, then we need a flag to
revert to the original behaviour.
Lastly, I guess I have two questions.
1. Are you sure that Open MPI behaves in "unexpected ways?" This all
worked fine for me as I stated above.
2. Do you have any more details on the cost of "resolving every name"?
Which API is it that causes the problems? I only ask because I have
been trying to understand some of the NIS traffic I see when running
on my cluster.
Ralph Castain wrote:
>A recent email thread on the devel list involved (in part) the question of
>hostname resolution. [Note: I have a fix for the localhost problem described
>in that thread - just need to chase down a memory corruption problem, so it
>won't come into the trunk until next week]
>This is a problem that has troubled us since the beginning, and we have gone
>back-and-forth on solutions. Rather than just throwing another code change
>into the system, Jeff and I thought it might be a good idea to seek input
>from the community.
>The problem is that our system requires a consistent way of identifying
>nodes so we can tell if, for example, we already have a daemon on that node.
>We currently do that via a string hostname. This appears to work just fine
>in managed environments as the allocators are (usually?) consistent in how
>they name a node.
>However, users are frequently not consistent, which causes a problem. For
>example, users can create a hostfile entry for "foo.bar.net", and then put
>"-host foo" on their command line. In Open MPI, these will be treated as two
>completely separate nodes.
>In the past, we attempted to solve this by actually resolving every name
>provided to us. However, resolving names of remote hosts can be a very
>expensive function call, especially at scale. One solution we considered was
>to only do this for non-managed environments - i.e., when provided names in
>a hostfile or via -host. This was rejected on the grounds that it penalized
>people who used those mechanisms and, in many cases, wasn't necessary
>because users were careful to avoid ambiguity.
>But that leaves us with an unsolved problem that can cause Open MPI to
>behave in unexpected ways, including possibly hanging. Of course, we could
>just check names for matches in that first network name field - this would
>solve the "foo" vs "foo.bar.net" problem, but creates a vulnerability (what
>if we have both "foo.bar.net" and "foo.no-bar.net" in our hostfile?) that
>may or may not be acceptable (I'm sure it is at least uncommon for an MPI
>app to cross subnet boundaries, but maybe someone is really doing this in
>some rsh-based cluster).
>Or we could go back to fully resolving names provided via non-managed
>channels. Or we just tell people that "you *must* be consistent in how you
>identify nodes". Or....?
>Any input would be appreciated.
>devel mailing list