Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Pineapple Runtime Interposition Project
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2012-06-18 12:46:48


That sounds good.

The approach for this project can be seen as multi-phased.

Phase 0: Baseline
 - Baseline implementation of just the interfaces that OMPI uses.
 - This is what I want to commit to the trunk next week.

Phases 1-N: Interface enhancements
 - There are a number of suggested enhancements for the interface
going forward. From making the API a bit more general to exposing more
of ORTE through the pineapple interface.
 - Per the 'OMPI/ORTE/OPAL stack is king' discussion, the Open MPI
community needs to discuss the manipulation of the interface for
projects outside of that stack. So it seems that interface
modifications needed for projects outside of the OMPI/ORTE/OPAL stack
need to be discussed by the Open MPI community.
 - As Ralph pointed out, this will probably not be an easy discussion at times.
 - No timeline for these phases

I intentionally did not want to blur Phase 0 with Phase 1-N so that we
can get things going. It seems from past attempts at this that Phase
1-N seem to sink the conversation and good software development in
Phase 0 is lost. So I want to get Phase 0 in the trunk, then if folks
want to talk about interface tweaks we can do so as needed going
forward.

I should note that at this time I, personally, do not have any
interface items to be addressed. This interface is sufficient for what
I need right now.

I'll get back to work finishing up the branch :)

-- Josh

On Mon, Jun 18, 2012 at 9:59 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> No disagreement over the approach - having the interface only cover OMPI as it sits is fine. As I said at the meeting, those of us using ORTE for other purposes have no real reason to need "pineapple" and can just work directly with the ORTE interfaces (GP will commit to this route). Based on that plus your comments, I would leave the interface alone for now.
>
> I remain unconvinced by the "put other RTEs under OMPI" argument, as you know, but I won't belabor it. We'll let time show us just how real that concern is. For now, as we agreed at the meeting, we'll modify the interface as required to meet OMPI's needs for its integration with the ORTE trunk. We know that will mean some near-term changes as we work on modex, but we can adjust as needed.
>
> I don't really care about the name, but just want something usably short. I'm content with the old "Ompi Runtime Services Layer" (ORSL), if you want to go back to it. Not sure we need to spend cycles trying to get the name to reflect the precedence agreement - the precise definition of the API is likely to get argued over multiple times regardless, based on prior history, so a stormy future for this proposal (regardless of name) is a reasonable prediction.
>
> Thanks Josh!
> Ralph
>
> On Jun 18, 2012, at 7:06 AM, Josh Hursey wrote:
>
>> On Sat, Jun 16, 2012 at 10:32 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> Hi Josh
>>>
>>> I had a chance to review your code this morning, and generally find it is okay with me. I see a couple of things that appear to limit it, though they may be intentional:
>>
>> In this pass of the pineapple interface I included only those
>> interfaces that OMPI uses without extension. So if OMPI did not use a
>> parameter or only used an interface in a single way then I simplified
>> the interface appropriately. The intention in this first pass was to
>> provide only what OMPI needs from ORTE in the interface, and nothing
>> more.
>>
>> Extending this interface to be more flexible and extensible I saw as a
>> secondary discussion that the group can have moving forward from this
>> initial baseline.
>>
>>>
>>> 1. the call to pineapple_init really needs a third flag to define the process type. Locking the underlying orte_init to MPI seems to somewhat defeat your goal of allowing pineapple to be used for non-MPI purposes
>>
>>
>> For OMPI, we only ever pass one type to the orte_init function. As
>> such, I eliminated the third parameter since it was unnecessary for
>> OMPI in this baseline pass.
>>
>> However, it may be useful to have for non-MPI purposes. Maybe a 'int'
>> parameter with pineapple only defining that '0' is OMPI_PROCESS, and
>> all other values are to be defined in the future. However, I am
>> hesitant to extend the interface for what we 'might' do, but rather
>> only extend the interface for what we will support going forward. But
>> if you have a good use case, then we can discuss adding it to support
>> non-MPI layers above. I just did not do so since it is not what OMPI
>> needed, so why make a more complex API in this first pass.
>>
>>>
>>> 2. the barrier and other collectives are locked to the MPI_Init and MPI_Finalize procedures due to hardcoding of the collective id. You might want to consider altering the API to pass a collective id down so these functions can be used in other places.
>>
>>
>> Making this more generic would be useful. I'll see what I can do when
>> I dig back in this week. I hardcoded them since that is exactly how
>> OMPI uses them today. But having them more widely accessible would be
>> better.
>>
>> However, as you allude to in the next paragraph, we do not want to
>> define an ultimate generalized RTE abstraction. So we must be careful
>> when defining more generic interfaces that we are not hindering the
>> OMPI/ORTE/OPAL stack and that we have a testable usecase for the
>> interface extension. This case (collective uses) is easier, but I can
>> think of other places wehre it could get more delicate.
>>
>>>
>>> Finally, we have to get rid of the "pineapple" name. It seems to me that the primary purpose of this work is to allow ORTE to be used more generally, and to support multiple variants of ORTE within OMPI. So how about calling it "ORte Abstraction Layer", or ORAL? This would emphasize that we are not trying to create the ultimate generalized RTE abstraction, which I think is important for all the reasons raised at the recent meeting.
>>
>>
>> Renaming 'pineapple' is the very last step. So we can discuss that
>> until just before it comes into the trunk.
>>
>> The primary purpose of this effort is two fold. First, to allow ORTE
>> to be used more generally (something other than, or in addition to,
>> OMPI above it). Secondly, to allow OMPI to be used across different
>> RTEs (something other than, or in addition to, ORTE below it). We are
>> not trying to create the ultimate generalized RTE abstraction layer,
>> but something that serves the primary master of the interaction
>> between OMPI/ORTE.
>>
>> However, if we consider calling it the 'ORte Abstraction Layer' for
>> OMPI then we could just as easily call it the 'OMPI RTE Abstraction
>> Layer' for ORTE. And then we quickly get back to the who owns the
>> interface issue, and which project stack it serves. I think a better
>> name is one that ties OMPI and ORTE together - maybe... OMPI and ORTE
>> Synergistic Abstraction (OOSA) Layer. OMPI/ORTE/OPAL stack is king,
>> and having the name reflect that would be good. I am just not sure
>> what that would be at the moment.
>>
>> -- Josh
>>
>>>
>>> HTH
>>> Ralph
>>>
>>>
>>> On Jun 15, 2012, at 12:55 PM, Josh Hursey wrote:
>>>
>>>> What: A Runtime Interposition Project - Codename Pineapple
>>>>
>>>> Why: Define clear API and semantics for runtime requirements of the OMPI layer.
>>>>
>>>> When:
>>>> - F June 22, 2012 - Work completed
>>>> - T June 26, 2012 - Discuss on teleconf
>>>> - R June 28, 2012 - Commit to trunk
>>>>
>>>> Where: Trunk (development BitBucket branch below)
>>>>  https://bitbucket.org/jjhursey/ompi-pineapple
>>>>
>>>> Attached:
>>>>  PDF of slides presented on the June 12, 2012 teleconf. Note that the
>>>> timeline was slightly adjusted above (work completed date moved
>>>> ealier).
>>>>
>>>>
>>>> Description: Short Version
>>>> --------------------------
>>>> Define, in an 'rte.h', the interfaces and semantics that the OMPI
>>>> layer requires of a runtime environment. Currently this interface
>>>> matches the subset of ORTE functionality that is used by the OMPI
>>>> layer. Runtime symbols (e.g., orte_ess.proc_get_locality) are isolated
>>>> to a framework inside this project to provide linker-level protection
>>>> against accidental breakage of the pineapple interposition layer.
>>>>
>>>> The interposition project provides researchers working on side
>>>> projects above and below the 'rte.h' interface a single location in
>>>> the code base to watch for interface and semantic changes that they
>>>> need to be concerned about. Researchers working above the pineapple
>>>> layer might explore something other than (or in addition to) OMPI
>>>> (e.g., Extended OMPI, UPC+OMPI). Researchers working below the
>>>> pineapple layer might explore something other than (or in addition to)
>>>> ORTE under OMPI (e.g., specialized runtimes for specific
>>>> environments).
>>>>
>>>>
>>>> Description: Other notes
>>>> ------------------------
>>>> The pineapple interface provides OMPI developers with a runtime API to
>>>> program against without requiring detailed knowledge of the layout of
>>>> ORTE and its frameworks. In some places in OMPI a single source file
>>>> needs to include >5 (up to 12 in one place) different header files to
>>>> get all of the necessary symbols. Developers must not only know where
>>>> these headers are, but must also understand the differences between
>>>> the various frameworks in ORTE to use ORTE. The developer must also be
>>>> aware that there are certain APIs and data structure fields that are
>>>> not available to the MPI process, so should not be used. The pineapple
>>>> project provides an API representing the small subset of ORTE that is
>>>> used by OMPI. With this API a developer only needs to look at a single
>>>> location in the code base to understand what is provided by the
>>>> runtime for use in the OMPI layer.
>>>>
>>>> A similar statement could be made for runtime developers trying to
>>>> figure out what the OMPI layer requires from the a runtime
>>>> environment. Currently they need a deep understanding of the behavior
>>>> of ORTE to understand the semantics of various calls to ORTE from the
>>>> OMPI layer. Then they must develop a custom patch for the OMPI layer
>>>> that extracts the ORTE symbols, and replaces them with their own
>>>> symbols. This process is messy, error prone, and tedious to say the
>>>> least. Having a single set of interfaces and semantics will allow such
>>>> developers to focus their efforts on supporting the Open MPI community
>>>> defined API, and not necessarily the evolution of the ORTE or OMPI
>>>> project internals. This is advantageous when porting Open MPI to an
>>>> environment with a full featured runtime already running on the
>>>> machine, and for researchers exploring radical runtime designs for
>>>> future systems. The pineapple API allows such projects to develop
>>>> beside the mainline Open MPI trunk a little more easily than without
>>>> the pineapple API.
>>>>
>>>>
>>>> FAQ:
>>>> ----
>>>> (1) Why is this a separate project and not a framework of OMPI? or a
>>>> framework of ORTE?
>>>>
>>>> After much deliberation between the developers, from a software
>>>> engineering perspective, making the pineapple rte.h interface a
>>>> separate project was the most flexible solution. So neither the OMPI
>>>> layer nor the ORTE layer 'own' the interface, but it is 'owned' by the
>>>> Open MPI project primarily to support the interaction between these
>>>> two layers.
>>>>
>>>> Consider that if we decided to place the interface in the OMPI layer
>>>> as a framework then we would be able to place something other than (or
>>>> in addition to) ORTE underneath OMPI, but we would be limited in our
>>>> ability to place something other than (or in addition to) OMPI over
>>>> ORTE. Alternatively, if we decided to place the rte.h interface in the
>>>> ORTE layer then we would be able to place something other than (or in
>>>> addition to) OMPI over ORTE, but we would be limited in our ability to
>>>> place something other than (or in addition to) ORTE under OMPI.
>>>> Defining the interposition layer as a separate project between these
>>>> two layers allows maximal flexibility for the project and researchers
>>>> working on side branches.
>>>>
>>>>
>>>> (2) What if another project outside of Open MPI needs interface
>>>> changes to the pineapple 'rte.h'?
>>>>
>>>> The rule of thumb is that 'The OMPI/ORTE/OPAL stack is king!'. This
>>>> means that the pineapple project should always err on the side of
>>>> supporting the OMPI/ORTE/OPAL stack, as that is the flagship product
>>>> of the Open MPI project. Interface suggestions are always welcome and
>>>> the rte.h may be extended/modified in the future as a result of those
>>>> suggestions. However, if a suggested change negatively impacts the
>>>> OMPI/ORTE/OPAL stack then it is unlikely to be accepted upstream by
>>>> the Open MPI community.
>>>>
>>>>
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>> <Pineapple-Teleconf.pdf>_______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey