Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Commit r19868
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-10-31 19:53:37

Crumby - referenced wrong commit. My commit was r19866. My apologies
to George, the author of 19868 that cleaned up a problem created by my


On Oct 31, 2008, at 5:50 PM, Ralph Castain wrote:

> Hi all
> I made a commit a little earlier that contains modifications that
> reduces duplicate data storage and represents a first step towards
> supporting fully routed RML communications, along with a new "radix
> tree" routed component requested by ORNL. There will undoubtedly be
> improvements to these changes over the next few months, but they
> provide an initial platform for us to more thoroughly investigate
> the issues involved in fully routing all out-of-band communications.
> A brief outline of the changes include:
> 1. removes the direct routed component and adds a new "radix"
> component
> 2. shifts storage of nidmap and pidmap info from the odls to the ess
> on daemons - this is where the data is stored for everyone else, so
> it makes no sense to store it someplace different on the daemon.
> Required adding an API to the ess framework so that a pidmap can be
> added to the data in the ess when daemons get a comm_spawn request
> (the ess data store was already setup for this - just didn't have
> the API yet).
> 3. adds an API to the ess framework to obtain the daemon that hosts
> a specified proc from the ess pidmap. Because this data is now
> obtained here, we don't need to keep calling
> orte_routed.update_route for every proc in our own job - so those
> calls have been removed from the startup procedure. This eliminates
> the hash tables in every routed module that essentially duplicated
> the pidmap already present in the ess - not because anyone was
> stupid, but rather because the first routed modules were originally
> written prior to the ess pidmap being created, and everyone copy/
> pasted from there.
> At the moment, the revised trunk fully routes all communications
> with two exceptions:
> 1. the binomial module still directly routes between all daemons -
> i.e., communications don't flow along the tree, but instead short-
> circuit the tree to go directly to the daemon that hosts the target
> proc. I propose to change this in a later revision, but want to
> leave something constant for the moment.
> 2. all routed modules have daemons sending direct to the HNP itself.
> This was required for two reasons:
> (a) during startup, the daemons need to "phone home", but have no
> knowledge at that moment of the contact info for the other daemons
> in the routing tree. Thus, they have no choice but to send direct to
> the HNP. We hope to change this in a later revision by switching to
> well-known static ports - but for now, we have to go direct.
> (b) in our current shutdown procedure, the outbound message telling
> the orteds to terminate goes out across the module's routing tree.
> This xcast procedure causes the daemon to relay the cmd to the next
> daemons in the tree, and then to execute it. Thus, after relaying
> the cmd, the daemon dutifully terminates. However, we require each
> daemon to send a confirming message to return to the HNP so it knows
> it can exit. That returning message cannot get through because the
> intermediate daemons have already terminated. I am working on
> alternative methods for detecting daemon termination so we can
> eliminate the return "ack" - but for now, we have to send the "ack"
> direct to the HNP to ensure it gets through.
> Some preliminary tests I've conducted indicate that fully routing
> communications had no detrimental impact on launch speed nor IB
> wireup time. I plan to further test this at larger scales, as well
> as continue to develop the new capabilities.
> Please let me know if you encounter any problems, or have any
> comments/suggestions.
> Ralph