Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fwd: OpenMPI changes
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-03-04 16:37:42

Greg --

I admit to being a bit puzzled here. Ralph sent around RFCs about
these changes many months ago. Everyone said they didn't want this
functionality -- it was seen as excess functionality that Open MPI
didn't want or need -- so it was all removed.

As such, I have to agree with Ralph that it is an "enhancement" to re-
add the functionality. That being said, patches are always welcome!
IBM has signed the OMPI 3rd party contribution agreement, so it could
be contributed directly.

Sidenote: I was also under the impression that PTP was being re-geared
towards STCI and moving away from ORTE anyway. Is this incorrect?

On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:

> Hi all,
> Ralph informs me that significant functionality has been removed from
> ORTE in 1.3. Unfortunately this functionality was being used by PTP to
> provide support for OMPI, and without it, it seems unlikely that PTP
> will be able to work with 1.3. Apparently restoring this lost
> functionality is an "enhancement" of 1.3, and so is something that
> will not necessarily be done. Having worked with OMPI from a very
> early stage to ensure that we were able to provide robust support, I
> must say it is a bit disappointing that this approach is being taken.
> I hope that the community will view this "enhancement" as worthwhile.
> Regards,
> Greg
> Begin forwarded message:
>> On 2/29/08 7:13 AM, "Gregory R Watson" <grw_at_[hidden]> wrote:
>>> Ralph Castain <rhc_at_[hidden]> wrote on 02/29/2008 12:18:39 AM:
>>>> Ralph Castain <rhc_at_[hidden]>
>>>> 02/29/08 12:18 AM
>>>> To
>>>> Gregory R Watson/Watson/IBM_at_IBMUS
>>>> cc
>>>> Subject
>>>> Re: OpenMPI changes
>>>> Hi Greg
>>>> All of the prior options (and some new ones) for spawning a job
>> are fully
>>>> supported in the new interface. Instead of setting them with
>> "attributes",
>>>> you create an orte_job_t object and just fill them in. This is
>> precisely how
>>>> mpirun does it - you can look at that code if you want an
>> example, though it
>>>> is somewhat complex. Alternatively, you can look at the way it is
>> done for
>>>> comm_spawn, which may be more analogous to your situation - that
>> code is in
>>>> ompi/mca/dpm/orte.
>>>> All the tools library does is communicate the job object to the
>> target
>>>> persistent daemon so it can do the work. This way, you don't have
>> to open
>>>> all the frameworks, deal directly with the plm interface, etc.
>>>> Alternatively, you are welcome to do a full orte_init and use the
>> frameworks
>>>> yourself - there is no requirement to use the library. I only
>> offer it as an
>>>> alternative.
>>> As far as I can tell, neither API provides the same functionality
>> as that
>>> available in 1.2. While this might be beneficial for OMPI-specific
>> activities,
>>> the changes appear to severely limit the interaction of tools with
>> the
>>> runtime. At this point, I can't see either interface supporting PTP.
>> I went ahead and added a notification capability to the system -
>> took about
>> 30 minutes. I can provide notice of job and process state changes
>> since I
>> see those. Node state changes, however, are different - I can notify
>> on
>> them, but we have no way of seeing them. None of the environments we
>> support
>> tell us when a node fails.
>>>> I know that the tool library works because it uses the identical
>> APIs as
>>>> comm_spawn and mpirun. I have also tested them by building my own
>> tools.
>>> There's a big difference being on a code path that *must* work
>> because it is
>>> used by core components, to one that is provided as an add-on for
>> external
>>> tools. I may be worrying needlessly if this new interface becomes an
>>> "officially supported" API. Is that planned? At a minimum, it
>> seems like it's
>>> going to complicate your testing process, since you're going to
>> need to
>>> provide a separate set of tests that exercise this interface
>> independent of
>>> the rest of OMPI.
>> It is an officially supported API. Testing is not as big a problem
>> as you
>> might expect since the library exercises the same code paths as
>> mpirun and
>> comm_spawn. Like I said, I have written my own tools that exercise
>> the
>> library - no problem using them as tests.
>>>> We do not launch an orted for any tool-library query. All we do is
>>>> communicate the query to the target persistent daemon or mpirun.
>> Those
>>>> entities have recv's posted to catch any incoming messages and
>> execute the
>>>> request.
>>>> You are correct that we no longer have event driven notification
>> in the
>>>> system. I repeatedly asked the community (on both devel and core
>> lists) for
>>>> input on that question, and received no indications that anyone
>> wanted it
>>>> supported. It can be added back into the system, but would
>> require the
>>>> approval of the OMPI community. I don't know how problematic that
>> would be -
>>>> there is a lot of concern over the amount of memory, overhead,
>> and potential
>>>> reliability issues that surround event notification. If you want
>> that
>>>> capability, I suggest we discuss it, come up with a plan that
>> deals with
>>>> those issues, and then take a proposal to the devel list for
>> discussion.
>>>> As for reliability, the objectives of the last year's effort were
>> precisely
>>>> scalability and reliability. We did a lot of work to eliminate
>> recursive
>>>> deadlocks and improve the reliability of the code. Our current
>> testing
>>>> indicates we had considerable success in that regard,
>> particularly with the
>>>> recursion elimination commit earlier today.
>>>> I would be happy to work with you to meet the PTP's needs - we'll
>> just need
>>>> to work with the OMPI community to ensure everyone buys into the
>> plan. If it
>>>> would help, I could come and review the new arch with the team (I
>> already
>>>> gave a presentation on it to IBM Rochester MN) and discuss required
>>>> enhancements.
>>> PTP's needs have not changed since 1.0. From our perspective, the
>> 1.3 branch
>>> simply removes functionality that is required for PTP to support
>> OMPI. It
>>> seems strange that we need "approval of the OMPI community" to
>> continue to use
>>> functionality that has been available since 1.0. In any case,
>> there are
>>> unfortunately no resources to work on the kind of re-engineering
>> that appears
>>> to be required to support 1.3, even if it did provide the
>> functionality we
>>> need.
>> Afraid I have to be driven by the OMPI community's requirements
>> since they
>> pay my salary :-) What they need is a "lean, mean, OMPI machine" as
>> they
>> say, and (for some reason) they view the debugger community as
>> consisting of
>> folks like totalview, vampirtrace, etc. - all of whom get involved
>> (either
>> directly or via one of the OMPI members) in the requirements
>> discussions.
>> Can't argue with business decisions, though. I gather there was some
>> mention
>> of PTP at the recent LANL/IBM RR meeting, so I'll let people know
>> that PTP
>> won't be an option on RR.
>> And I'll see if there is any interest here in adding 1.3 support to
>> PTP
>> ourselves - from looking at your code, I think it would take about a
>> day,
>> assuming someone more familiar with PTP will work with me.
>> Take care
>> Ralph
>>> Greg
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

Jeff Squyres
Cisco Systems