Additionally, I believe that the FT system already does something like what you describe (although perhaps not exactly the same thing) -- there is a phase where the FT system pauses and quiesces all BTLs.
Did you look at that part of the code, perchance, and see if it meets your needs?
On Jan 11, 2010, at 3:53 PM, Christoph Konersmann wrote:
> Thanks a lot for your help! I will give it a try.
> Ralph Castain schrieb:
> > You've got this a tad wrong, but that's okay - let me try to clarify a couple of things that may help.
> > First, you don't want to add this as a separate orted command. As you noted, orte has no direct way to tell the OMPI layer to do anything. Instead, you want to pass a message to the process that is received in the OMPI layer. That is easy to do.
> > 1. add a message tag in ompi/mca/dpm/dpm.h - perhaps something like OMPI_RML_TAG_BTL_CTL
> > 2. in the btl, add a call to orte_rml.recv_nb() that identifies the above tag and specifies a callback function to use when such a message arrives
> > 3. in that callback function, toggle your "paused" flag - or you can unpack the buffer to get a flag telling you what value to set. Your choice.
> > Now, when you want to pause the BTL, you do an orte_grpcomm.xcast() to the above message tag. ORTE will deliver that message to every process, which will then have its callback function called.
> > HTH
> > Ralph
> Paderborn Center for Parallel Computing - PC2
> University of Paderborn - Germany
> Christoph Konersmann <c_k_at_[hidden]>
> devel mailing list