Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Rolf.Vandevaart_at_[hidden]
Date: 2007-07-20 10:13:54


Greetings:
Ralph brings up some good points here. I have a few thoughts/experiences.
First, I like the way things are behaving now. In fact, I take full
advantage
of the fact the different aliases for a node are treated as different nodes
to do some scalability testing. It is in this way that I fake out the
ORTE and
have it start multiple daemons on a node. (We had a similar feature in our
old ClusterTools runtime environment to get multiple daemons running
on a single node)

For example, I do this to get 4 orteds running on "alachua".

mpirun -np 4 -host alachua,alachua-1,alachua-2,alachua-3 hostname

All of the above resolve to the same IP address.

Secondly, I would not want us to make any change that negatively affects
scalability. If we do decide to make a change, then we need a flag to
revert to the original behaviour.

Lastly, I guess I have two questions.
1. Are you sure that Open MPI behaves in "unexpected ways?" This all
worked fine for me as I stated above.
2. Do you have any more details on the cost of "resolving every name"?
Which API is it that causes the problems? I only ask because I have
been trying to understand some of the NIS traffic I see when running
on my cluster.

Thanks,
Rolf

Ralph Castain wrote:

>Yo all
>
>A recent email thread on the devel list involved (in part) the question of
>hostname resolution. [Note: I have a fix for the localhost problem described
>in that thread - just need to chase down a memory corruption problem, so it
>won't come into the trunk until next week]
>
>This is a problem that has troubled us since the beginning, and we have gone
>back-and-forth on solutions. Rather than just throwing another code change
>into the system, Jeff and I thought it might be a good idea to seek input
>from the community.
>
>The problem is that our system requires a consistent way of identifying
>nodes so we can tell if, for example, we already have a daemon on that node.
>We currently do that via a string hostname. This appears to work just fine
>in managed environments as the allocators are (usually?) consistent in how
>they name a node.
>
>However, users are frequently not consistent, which causes a problem. For
>example, users can create a hostfile entry for "foo.bar.net", and then put
>"-host foo" on their command line. In Open MPI, these will be treated as two
>completely separate nodes.
>
>In the past, we attempted to solve this by actually resolving every name
>provided to us. However, resolving names of remote hosts can be a very
>expensive function call, especially at scale. One solution we considered was
>to only do this for non-managed environments - i.e., when provided names in
>a hostfile or via -host. This was rejected on the grounds that it penalized
>people who used those mechanisms and, in many cases, wasn't necessary
>because users were careful to avoid ambiguity.
>
>But that leaves us with an unsolved problem that can cause Open MPI to
>behave in unexpected ways, including possibly hanging. Of course, we could
>just check names for matches in that first network name field - this would
>solve the "foo" vs "foo.bar.net" problem, but creates a vulnerability (what
>if we have both "foo.bar.net" and "foo.no-bar.net" in our hostfile?) that
>may or may not be acceptable (I'm sure it is at least uncommon for an MPI
>app to cross subnet boundaries, but maybe someone is really doing this in
>some rsh-based cluster).
>
>Or we could go back to fully resolving names provided via non-managed
>channels. Or we just tell people that "you *must* be consistent in how you
>identify nodes". Or....?
>
>Any input would be appreciated.
>Ralph
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>