Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: VM launch
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-01-03 12:45:09

WHAT: convert orte to start by launching a virtual machine across all allocated nodes

WHY: support topologically-aware mapping methods

WHEN: sometime over the next couple of months

Several of us (including Jeff, Terry, Josh, and Ralph) are working to create topologically-aware mapping modules. This includes modules that correctly map processes to cores/sockets, perhaps take into account NIC proximity and switch connectivity, etc.

In order to make this work, the rmaps components in mpirun need to know the local topology of the nodes in the allocation. We currently obtain that info from the orted's as each orted samples the local topology via the opal sysinfo framework and then reports it back to mpirun. Unfortunately, we currently don't launch the orteds until -after- we map the job, so the topology info cannot be used in the mapping algorithm.

This work will modify the launch procedure to:

1. determine the final "allocation" using the current ras + hostfile + dash-host method.

2. launch a daemon on every node in the final "allocation"

3. each daemon discovers the local resources and reports that info back to mpirun

4. mpirun maps the job against the daemons using the node resource info

5. mpirun sends the launch msg to all daemons.

6. the daemons launch the job -and- provide a global topology map to all procs for their subsequent use

Note the significant change here: in the current procedure, we map the job on the nodes-to-be-used and then only launch daemons on nodes that have application procs on them. If the app then calls comm_spawn, we launch any additional daemons as required.

Under this revised procedure, we might launch daemons on nodes that are not used by the initial job. If the app then calls comm_spawn, no additional daemons will be required as we already have daemons on all available nodes. This simplifies comm_spawn, but precludes the ability of an app to dynamically discover and add nodes to the "allocation". There has been sporadic interest in such a feature, but nothing concrete.