At the weekly telecon this week, we talked about when to branch the 1.3
release. I was asked if I could provide a list of where we stand relative to
promised functionality, at least as far as the RTE is concerned.
Here is what I have compiled, in rough grouping by priority as expressed to
me:
Promised, and needed
* topo mapper - automated mapping that puts ranks on network-nearest
neighbors. Required by several of LANL's more ambitious science
projects. I'll hopefully have a prototype in the system before
leaving on vacation.
* xml output - required for Eclipse PTP support, desired by several
other tools. As per the telecon, there is no way we can get
something meaningful in the system before the proposed code
branch. However, this is needed by Oct for PTP - more lenient
timeframe from the other tools. What we -can- do is get the
output framework created before the branch, and then add the
xml component during the summer - but that requires a change
from our usual policy of no new components in sub-releases.
Requires new mpirun cmd line flag: -xml proposed.
* upgrades to the sequential mapper - add ability to provide
relative sequencing for automated node allocations, claim
multiple slots for a rank. All fits within existing component.
* local orted spawn - ability for remote orted to locally spawn
a coprocessor process. Required for hybrid RR where MPI procs
are needed on the coprocessor. Basic elements are in system,
but need to be completed now that launch system is stabilizing.
Promised, could be delayed
* minimizing HNP sockets - everything we need is in the system.
What we need is just to pass to the orteds the nodemap in a
manner that they can decode and use during their startup so
they don't have to callback to the HNP. The scheme has been
designed - just needs to be implemented.
* carto routed - uses the provided network topology to define
RML message routing, thus minimizing message hops during
startup.
* direct/standalone launch - I believe the basic infrastructure
is now present, and indeed at least a couple of systems use
standalone launch methods now. Expanding that to additional
environments will take new PLM/ESS components, perhaps with
supporting utilities. Likely not appropriate for a sub-release.
* static ports - basic infrastructure for procs and orteds to use
static OOB/TCP ports, but we don't currently take advantage of it.
This shouldn't require any API changes or major restructuring of
code as everything required is already there.
* add-hostfile, add-host - these were included in the hostfile
wiki page description as they had been requested by several users.
If not included in 1.3, we need to update the wiki page and include
that fact in the FAQ section, at the least, since users were
told this would be supported.
Wanted/Requested by various users or developers
* orted sm file - some of our improved behavior depends upon
exclusive use of nodes. We can remove that constraint by
letting jobs from different users that are colocated on a node
have knowledge of each other's existence. It has been
proposed that this be accomplished by creating a shared memory
area that the procs/orteds can access to find out who else
is on a node, what static ports they are using, etc. Design
still to be worked out.
* usage reporting - add appropriate mpirun cmd line option to
request the orteds to report proc resource usage upon proc
termination. Pretty trivial to do. Requested by a few users
and a couple of tool developers.
* tool query support - ability for a tool to interactively
query process/job status, usage stats, etc. The tool comm
library is partially implemented today, but doesn't support
the full range of requested functionality.
* support for recursive mpirun calls - this has come up a few
times on the user list. Basically, it requires adding a new
mpirun cmd line option (--recursive) so mpirun can purge the
environment of mca params set during spawn before calling
orte_init.
Future improvements
* reduced launch messaging - put launch information in orted's environment
(for systems that support it) so that orted can determine and launch
its local procs without communicating back to the HNP. We have a design
for this capability, but have purposely held off implementation until
after the 1.3 branch.
* minimized mpirun memory footprint - we currently store a bunch of info
to support various debuggers, c/r, etc. This info isn't actually required
to be stored for operation of the MPI job and/or ORTE, so it could
either be released or simply not created. This plan calls for yet
another option(!) that would tell mpirun to minimize its memory
footprint. Design has been done - implementation has not started.
|