The Process Management Interface (PMI) has been used for
quite some time as a means of exchanging wireup information
needed for interprocess communication. Two versions (PMI-1 and PMI-2)
have been released as part of the MPICH effort. While PMI-2 demonstrates
better scaling properties than its PMI-1 predecessor, attaining rapid
launch and wireup of the roughly 1M processes executing across 100k nodes
expected for exascale operations remains challenging.
PMI Exascale (PMIx) represents an attempt to resolve these questions by
providing an extended version of the PMI standard specifically designed to support clusters up
to and including exascale sizes. The overall objective of the project is not to
branch the existing pseudo-standard definitions - in fact, PMIx fully supports
both of the existing PMI-1 and PMI-2 APIs - but rather to (a) augment and extend
those APIs to eliminate some current restrictions that impact scalability,
and (b) provide a reference implementation of the PMI-server that demonstrates
the desired level of scalability.
The client side library is provided as source under the Open MPI's New BSD license. Official releases are available from the PMIx Github repository located here. Client library features include:
- full PMI-1 and PMI-2 compabibility. Both sets of APIs are provided and supported. Calls to the APIs are translated into PMIx, thus ensuring that these functions receive the same scalability benefits as the native PMIx functions.
- use of shared memory to minimize footprint at scale. Data retrieved by calls to PMI_Get are stored by the local PMIx server in a shared memory region accessible by all local processes. Thus, once the data is retrieved the first time, all local processes can immediately access it without further communication.
- posting of data as a block. Data "put" by the application will be locally cached by the process until execution of the "commit" API - at that time, all data will be transmitted to the local PMIx server as a single "blob".
- retrieval of data as a block instead of item-by-item. Current PMI implementations return a single data element to the requesting process with each call it makes to PMI_Get, thus necessitating repeated communications to obtain all desired data. While minimizing the amount of data locally stored, most MPI processes will (if requesting any data about a peer) eventually query all posted data from that peer. Thus, PMIx anticipates these subsequent requests by obtaining and locally caching in the shared memory region all data posted by a process upon first request for data from that peer.
- added functions to support packing/unpacking of binary data. Currently, PMI only supports the transmission of string data. Although binary groupings can be encoded, the encoding process itself consumes both time and memory, thus increasing the volume of data that must be collectively communicated. This was originally done as a means of avoiding the heterogeneous data problem. PMIx, in contrast, provides the required pack/unpack functions to reliably send data between heterogeneous nodes, and a block API for posting and retrieving such blobs.
- addition of a non-blocking versions of all APIs so that processes can request operations and continue executing until the request can be satisfied. Notification is provided via the user-provided callback function, which includes delivery of any requested data.
- extension of the PMI_Put API to allow the passing of a flag indicating the scope of the data being published:
- PMIX_LOCAL - the data is intended only for other application processes on the same node. Data marked in this way will not be included in data packages sent to remote requestors
- PMIX_REMOTE - the data is intended solely for applications processes on remote nodes. Data marked in this way will not be shared with other processes on the same node
- PMIX_GLOBAL - the data is to be shared with all other requesting processes, regardless of location
- support for fork/exec of child processes by applications. The PMIx client will provide dynamic connections to the local server, thereby allowing any child process of an application process to also access PMI on its own behalf, if desired. The responsibility for defining any required unique PMI keys for the child is left to the application developer.
- thread safety and concurrency
We have chosen not to provide a standalone server implementation as the required messaging library would be an unnecessary encumbrance for existing resource managers and MPI implementations. Accordingly, we are providing a reference implementation of the PMIx server system as part of the Open MPI run-time environment (ORTE) and the related Open Resilient Cluster Manager (ORCM).
The PMI standard has been the subject of many papers over the
years, including the one
available here. Detailed documentation
on the design of PMIx itself, including the API, is under development on the
PMIx wiki page.
Getting and using pmix
The latest PMIx client releases are available as tarballs on the
web page. Nightly tarballs
of the developer master are also available on the nightly
Finally, the Github developer repository is accessible for
Questions and bugs
Questions, comments, and bugs should be sent to the pmix mailing lists, and/or
submitted to the PMIx bug tracking