Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC 2/2: merge the OPAL SOS development branchinto trunk
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-05-18 09:34:12


Indeed. Nice job yesterday, Abhishek. You did it better than my hwloc merge into the trunk! :-)

On May 18, 2010, at 9:20 AM, Josh Hursey wrote:

> Abhishek and Jeff,
>
> Awesome! Thanks for all your hard work maintaining and shepherding
> this branch into the trunk.
>
> -- Josh
>
> On May 17, 2010, at 9:20 PM, Abhishek Kulkarni wrote:
>
> >
> > On May 14, 2010, at 12:24 PM, Josh Hursey wrote:
> >
> >>
> >> On May 12, 2010, at 1:07 PM, Abhishek Kulkarni wrote:
> >>
> >>> Updated RFC (w/ discussed changes):
> >>>
> >>> =
> >>> =
> >>> ====================================================================
> >>> [RFC 2/2] merge the OPAL SOS development branch into trunk
> >>> =
> >>> =
> >>> ====================================================================
> >>>
> >>> WHAT: Merge the OPAL SOS development branch into the OMPI trunk.
> >>>
> >>> WHY: Bring over some of the work done to enhance error reporting
> >>> capabilities.
> >>>
> >>> WHERE: opal/util/ and a few changes in the ORTE notifier.
> >>>
> >>> TIMEOUT: May 17, Monday, COB.
> >>>
> >>> REFERENCE BRANCHES: http://bitbucket.org/jsquyres/opal-sos-fixed/
> >>>
> >>> =
> >>> =
> >>> ====================================================================
> >>>
> >>> BACKGROUND:
> >>>
> >>> The OPAL SOS framework tries to meet the following objectives:
> >>>
> >>> - Reduce the cascading error messages and the amount of code
> >>> needed to
> >>> print an error message.
> >>> - Build and aggregate stacks of encountered errors and associate
> >>> related individual errors with each other.
> >>> - Allow registration of custom callbacks to intercept error events.
> >>>
> >>> The SOS system provides an interface to log events of varying
> >>> severities. These events are associated with an "encoded" error
> >>> code
> >>> which can be used to refer to stacks of SOS events. When logging
> >>> events, they can also be transparently relayed to all the activated
> >>> notifier components.
> >>>
> >>> The SOS system is described in detail on this wiki page:
> >>>
> >>> http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
> >>> https://svn.open-mpi.org/trac/ompi/attachment/wiki/ErrorMessages/OPAL_SOS.pdf
> >>>
> >>> CHANGES (since the last RFC):
> >>>
> >>> * Wrapped all hard-coded error-code checks (OMPI_ERR_* == ret),
> >>> OPAL_SOS_GET_ERR_CODE(ret). There were about 30-40 such checks
> >>> each in the OMPI and ORTE layer and about 15 in the OPAL layer.
> >>> Since OPAL_SUCCESS is preserved by SOS, also changed calls of
> >>> the form (OPAL_SUCCESS != ret) to (OPAL_ERROR == ret).
> >>
> >> You mean the other way around, right?
> >> You changed code that previously looked like (OPAL_ERROR == ret) to
> >> (OPAL_SUCCESS != ret) where appropriate.
> >>
> >
> >
> > Yes, thanks for the correction! This (and ORTE WDC) is all in trunk
> > now -- I've split the changes into smaller patches (see commits
> > r23155 - r23164) so that they are easier to sift through.
> >
> > Abhishek
> >
> >
> >>>
> >>> * If the error is an SOS-encoded error, ORTE_ERROR_LOG decodes
> >>> the error, prints out the error stack and frees the errors.
> >>>
> >>> =
> >>> =
> >>> ====================================================================
> >>>
> >>>
> >>> On Mar 29, 2010, at 10:58 AM, Abhishek Kulkarni wrote:
> >>>
> >>>>
> >>>> =
> >>>> =
> >>>> =
> >>>> ===================================================================
> >>>> [RFC 2/2]
> >>>> =
> >>>> =
> >>>> =
> >>>> ===================================================================
> >>>>
> >>>> WHAT: Merge the OPAL SOS development branch into the OMPI trunk.
> >>>>
> >>>> WHY: Bring over some of the work done to enhance error reporting
> >>>> capabilities.
> >>>>
> >>>> WHERE: opal/util/ and a few changes in the ORTE notifier.
> >>>>
> >>>> TIMEOUT: April 6, Wednesday, COB.
> >>>>
> >>>> REFERENCE BRANCHES: http://bitbucket.org/jsquyres/opal-sos-fixed/
> >>>>
> >>>> =
> >>>> =
> >>>> =
> >>>> ===================================================================
> >>>>
> >>>> BACKGROUND:
> >>>>
> >>>> The OPAL SOS framework tries to meet the following objectives:
> >>>>
> >>>> - Reduce the cascading error messages and the amount of code
> >>>> needed to
> >>>> print an error message.
> >>>> - Build and aggregate stacks of encountered errors and associate
> >>>> related individual errors with each other.
> >>>> - Allow registration of custom callbacks to intercept error events.
> >>>>
> >>>> The SOS system provides an interface to log events of varying
> >>>> severities. These events are associated with an "encoded" error
> >>>> code
> >>>> which can be used to refer to stacks of SOS events. When logging
> >>>> events, they can also be transparently relayed to all the activated
> >>>> notifier components.
> >>>>
> >>>> The SOS system is described in detail on this wiki page:
> >>>>
> >>>> http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
> >>>>
> >>>> Feel free to comment and/or provide suggestions.
> >>>>
> >>>> =
> >>>> =
> >>>> =
> >>>> ===================================================================
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/