Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Resilient ORTE
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-06-10 16:40:47


So why not have the callback return an int, and your callback returns "go no further"?

On Jun 10, 2011, at 2:06 PM, Josh Hursey wrote:

> Yeah I do not want the default fatal callback in OMPI. I want to
> replace it with something that allows OMPI to continue running when
> there are process failures (if the error handlers associated with the
> communicators permit such an action). So having the default fatal
> callback called after mine would not be useful, since I do not want
> the fatal action.
>
> As long as I can replace that callback, or selectively get rid of it
> then I'm ok.
>
>
> On Fri, Jun 10, 2011 at 3:55 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>> On Jun 10, 2011, at 6:32 AM, Josh Hursey wrote:
>>
>>> On Fri, Jun 10, 2011 at 7:37 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>
>>>> On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote:
>>>>
>>>>>
>>>>> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote:
>>>>>
>>>>>> Well, you're way to trusty. ;)
>>>>>
>>>>> It's the midwestern boy in me :)
>>>>
>>>> Still need to shake that corn out of your head... :-)
>>>>
>>>>>
>>>>>>
>>>>>> This only works if all component play the game, and even then there it is difficult if you want to allow components to deregister themselves in the middle of the execution. The problem is that a callback will be previous for some component, and that when you want to remove a callback you have to inform the "next" component on the callback chain to change its previous.
>>>>>
>>>>> This is a fair point. I think hiding the ordering of callbacks in the errmgr could be dangerous since it takes control from the upper layers, but, conversely, trusting the upper layers to 'do the right thing' with the previous callback is probably too optimistic, esp. for layers that are not designed together.
>>>>>
>>>>> To that I would suggest that you leave the code as is - registering a callback overwrites the existing callback. That will allow me to replace the default OMPI callback when I am able to in MPI_Init, and, if I need to, swap back in the default version at MPI_Finalize.
>>>>>
>>>>> Does that sound like a reasonable way forward on this design point?
>>>>
>>>> It doesn't solve the problem that George alluded to - just because you overwrite the callback, it doesn't mean that someone else won't overwrite you when their component initializes. Only the last one wins - the rest of you lose.
>>>>
>>>> I'm not sure how you guarantee that you win, which is why I'm unclear how this callback can really work unless everyone agrees that only one place gets it. Put that callback in a base function of a new error handling framework, and then let everyone create components within that for handling desired error responses?
>>>
>>> Yep, that is a problem, but one that we can deal with in the immediate
>>> case. Since OMPI is the only layer registering the callback, when I
>>> replace it in OMPI I will have to make sure that no other place in
>>> OMPI replaces the callback.
>>>
>>> If at some point we need more than one callback above ORTE then we may
>>> want to revisit this point. But since we only have one layer on top of
>>> ORTE, it is the responsibility of that layer to be internally
>>> consistent with regard to which callback it wants to be triggered.
>>>
>>> If the layers above ORTE want more than one callback I would suggest
>>> that that layer design some mechanism for coordinating these multiple
>>> - possibly conflicting - callbacks (by the way this is policy
>>> management, which can get complex fast as you add more interested
>>> parties). Meaning that if OMPI wanted multiple callbacks to be active
>>> at the same time, then OMPI would create a mechanism for managing
>>> these callbacks, not ORTE. ORTE should just have one callback provided
>>> to the upper layer, and keep it -simple-. If the upper layer wants to
>>> toy around with something more complex it must manage the complexity
>>> instead of artificially pushing it down to the ORTE layer.
>>
>> I was thinking some more about this, and wonder if we aren't over-complicating the question.
>>
>> Do you need to actually control the sequence of callbacks, or just ensure that your callback gets called prior to the default one that calls abort?
>>
>> Meeting the latter requirement is trivial - subsequent calls to register_callback get pushed onto the top of the callback list. Since the default one always gets registered first (which we can ensure since it occurs in MPI_Init), it will always be at the bottom of the callback list and hence called last.
>>
>> Keeping that list in ORTE is simple and probably the right place to do it.
>>
>> However, if you truly want to control the callback order in detail - then yeah, that should go up in OMPI. I sure don't want to write all that code :-)
>>
>>
>>>
>>> -- Josh
>>>
>>>>>
>>>>> -- Josh
>>>>>
>>>>>>
>>>>>> george.
>>>>>>
>>>>>> On Jun 9, 2011, at 13:21 , Josh Hursey wrote:
>>>>>>
>>>>>>> So the "Resilient ORTE" patch has a registration in ompi_mpi_init.c:
>>>>>>> -------------
>>>>>>> orte_errmgr.set_fault_callback(&ompi_errhandler_runtime_callback);
>>>>>>> -------------
>>>>>>>
>>>>>>> Which is a callback that just calls abort (which is what we want to do
>>>>>>> by default):
>>>>>>> -------------
>>>>>>> void ompi_errhandler_runtime_callback(orte_process_name_t *proc) {
>>>>>>> ompi_mpi_abort(MPI_COMM_WORLD, 1, false);
>>>>>>> }
>>>>>>> -------------
>>>>>>>
>>>>>>> This is what I want to replace. I do -not- want ompi to abort just
>>>>>>> because a process failed. So I need a way to replace or remove this
>>>>>>> callback, and put in my own callback that 'does the right thing'.
>>>>>>>
>>>>>>> The current patch allows me to overwrite the callback when I call:
>>>>>>> -------------
>>>>>>> orte_errmgr.set_fault_callback(&my_callback);
>>>>>>> -------------
>>>>>>> Which is fine with me.
>>>>>>>
>>>>>>> At the point I do not want my_callback to be active any more (say in
>>>>>>> MPI_Finalize) I would like to replace it with the old callback. To do
>>>>>>> so, with the patch's interface, I would have to know what the previous
>>>>>>> callback was and do:
>>>>>>> -------------
>>>>>>> orte_errmgr.set_fault_callback(&ompi_errhandler_runtime_callback);
>>>>>>> -------------
>>>>>>>
>>>>>>> This comes at a slight maintenance burden since now there will be two
>>>>>>> places in the code that must explicitly reference
>>>>>>> 'ompi_errhandler_runtime_callback' - if it ever changed then both
>>>>>>> sites would have to be updated.
>>>>>>>
>>>>>>>
>>>>>>> If you use the 'sigaction-like' interface then upon registration I
>>>>>>> would get the previous handler back (which would point to
>>>>>>> 'ompi_errhandler_runtime_callback), and I can store it for later:
>>>>>>> -------------
>>>>>>> orte_errmgr.set_fault_callback(&my_callback, prev_callback);
>>>>>>> -------------
>>>>>>>
>>>>>>> And when it comes time to deregister my callback all I need to do is
>>>>>>> replace it with the previous callback - which I have a reference to,
>>>>>>> but do not need the explicit name of (passing NULL as the second
>>>>>>> argument tells the registration function that I don't care about the
>>>>>>> current callback):
>>>>>>> -------------
>>>>>>> orte_errmgr.set_fault_callback(&prev_callback, NULL);
>>>>>>> -------------
>>>>>>>
>>>>>>>
>>>>>>> So the API in the patch is fine, and I can work with it. I just
>>>>>>> suggested that it might be slightly better to return the previous
>>>>>>> callback (as is done in other standard interfaces - e.g., sigaction)
>>>>>>> in case we wanted to do something with it later.
>>>>>>>
>>>>>>>
>>>>>>> What seems to be proposed now is making the errmgr keep a list of all
>>>>>>> registered callbacks and call them in some order. This seems odd, and
>>>>>>> definitely more complex. Maybe it was just not well explained.
>>>>>>>
>>>>>>> Maybe that is just the "computer scientist" in me :)
>>>>>>>
>>>>>>> -- Josh
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 9, 2011 at 1:05 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>>>> You mean you want the abort API to point somewhere else, without using a new
>>>>>>>> component?
>>>>>>>> Perhaps a telecon would help resolve this quicker? I'm available tomorrow or
>>>>>>>> anytime next week, if that helps.
>>>>>>>>
>>>>>>>> On Thu, Jun 9, 2011 at 11:02 AM, Josh Hursey <jjhursey_at_[hidden]> wrote:
>>>>>>>>>
>>>>>>>>> As long as there is the ability to remove and replace a callback I'm
>>>>>>>>> fine. I personally think that forcing the errmgr to track ordering of
>>>>>>>>> callback registration makes it a more complex solution, but as long as
>>>>>>>>> it works.
>>>>>>>>>
>>>>>>>>> In particular I need to replace the default 'abort' errmgr call in
>>>>>>>>> OMPI with something else. If both are called, then this does not help
>>>>>>>>> me at all - since the abort behavior will be activated either before
>>>>>>>>> or after my callback. So can you explain how I would do that with the
>>>>>>>>> current or the proposed interface?
>>>>>>>>>
>>>>>>>>> -- Josh
>>>>>>>>>
>>>>>>>>> On Thu, Jun 9, 2011 at 12:54 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>>>>>> I agree - let's not get overly complex unless we can clearly articulate
>>>>>>>>>> a
>>>>>>>>>> requirement to do so.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 9, 2011 at 10:45 AM, George Bosilca <bosilca_at_[hidden]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> This will require exactly opposite registration and de-registration
>>>>>>>>>>> order,
>>>>>>>>>>> or no de-registration at all (aka no way to unload a component). Or
>>>>>>>>>>> some
>>>>>>>>>>> even more complex code to deal with internally.
>>>>>>>>>>>
>>>>>>>>>>> If the error manager handle the callbacks it can use the registration
>>>>>>>>>>> ordering (which will be what the the approach can do), and can enforce
>>>>>>>>>>> that
>>>>>>>>>>> all callbacks will be called. I would rather prefer this approach.
>>>>>>>>>>>
>>>>>>>>>>> george.
>>>>>>>>>>>
>>>>>>>>>>> On Jun 9, 2011, at 08:36 , Josh Hursey wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I would prefer returning the previous callback instead of relying on
>>>>>>>>>>>> the errmgr to get the ordering right. Additionally, when I want to
>>>>>>>>>>>> unregister (or replace) a call back it is easy to do that with a
>>>>>>>>>>>> single interface, than introducing a new one to remove a particular
>>>>>>>>>>>> callback.
>>>>>>>>>>>> Register:
>>>>>>>>>>>> ompi_errmgr.set_fault_callback(my_callback, prev_callback);
>>>>>>>>>>>> Deregister:
>>>>>>>>>>>> ompi_errmgr.set_fault_callback(prev_callback, old_callback);
>>>>>>>>>>>> or to eliminate all callbacks (if you needed that for somme reason):
>>>>>>>>>>>> ompi_errmgr.set_fault_callback(NULL, old_callback);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Joshua Hursey
>>>>>>>>> Postdoctoral Research Associate
>>>>>>>>> Oak Ridge National Laboratory
>>>>>>>>> http://users.nccs.gov/~jjhursey
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Joshua Hursey
>>>>>>> Postdoctoral Research Associate
>>>>>>> Oak Ridge National Laboratory
>>>>>>> http://users.nccs.gov/~jjhursey
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>
>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel