Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: ORTE state machine
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-03-18 22:47:21


WHY: Enable async progress

WHAT: Restructure ORTE to operate as a completely event-driven state machine

WHEN: ~April 1 (seems appropriate)

SIGNIFICANT CHANGES:
    * grpcomm API has changed
    * routed API has changed
    * state framework has been added to ORTE
    * OPAL SOS has been removed (per IU)
    * --enable-resilient-orte and all epoch code has been removed (per UTK)

KNOWN BREAKAGE:
    * checkpoint/restart is almost certainly broken

This has been discussed several times over the last 6-8 months. Going forward, we need to enable async progress at both the OMPI and ORTE level. This change deals solely with the latter area. All interactions with the ORTE level have been made non-blocking to allow the MPI layer to continue making separate progress. This is reflected in changes made to ompi_mpi_init, ompi_mpi_finalize, and dpm_orte.

The largest change is the introduction of the ORTE "state" framework that moves the launch of a job thru a series of events, each processing one step of the launch procedure. So allocation becomes an event, as does mapping. The state machine is implemented as a linked list, so variations of the procedures can be easily implemented by those wanting to try something different from the base implementation.

The daemon collectives have also been reworked to remove their "tree" dependency. Non-tree collectives can now be performed, and a few are in the works and should be committed shortly after the state machine is in the trunk.

The ability to run an ORTE progress thread has been included in the configure code (--enable-orte-progress-thread), but is off by default. As Brian noted, the MPI layer is not ready for this feature at this time. However, the ORTE code is fully prepared, so those interested in working on completing the async progress work in the MPI layer can do so.

The state machine branch is at https://bitbucket.org/rhc/ompi-term. I'm still doing some cleanup there, so don't be surprised if debug messages appear and/or things aren't completely right just yet.