A recent email thread on the devel list involved (in part) the question of
hostname resolution. [Note: I have a fix for the localhost problem described
in that thread - just need to chase down a memory corruption problem, so it
won't come into the trunk until next week]
This is a problem that has troubled us since the beginning, and we have gone
back-and-forth on solutions. Rather than just throwing another code change
into the system, Jeff and I thought it might be a good idea to seek input
from the community.
The problem is that our system requires a consistent way of identifying
nodes so we can tell if, for example, we already have a daemon on that node.
We currently do that via a string hostname. This appears to work just fine
in managed environments as the allocators are (usually?) consistent in how
they name a node.
However, users are frequently not consistent, which causes a problem. For
example, users can create a hostfile entry for "foo.bar.net", and then put
"-host foo" on their command line. In Open MPI, these will be treated as two
completely separate nodes.
In the past, we attempted to solve this by actually resolving every name
provided to us. However, resolving names of remote hosts can be a very
expensive function call, especially at scale. One solution we considered was
to only do this for non-managed environments - i.e., when provided names in
a hostfile or via -host. This was rejected on the grounds that it penalized
people who used those mechanisms and, in many cases, wasn't necessary
because users were careful to avoid ambiguity.
But that leaves us with an unsolved problem that can cause Open MPI to
behave in unexpected ways, including possibly hanging. Of course, we could
just check names for matches in that first network name field - this would
solve the "foo" vs "foo.bar.net" problem, but creates a vulnerability (what
if we have both "foo.bar.net" and "foo.no-bar.net" in our hostfile?) that
may or may not be acceptable (I'm sure it is at least uncommon for an MPI
app to cross subnet boundaries, but maybe someone is really doing this in
some rsh-based cluster).
Or we could go back to fully resolving names provided via non-managed
channels. Or we just tell people that "you *must* be consistent in how you
identify nodes". Or....?
Any input would be appreciated.