Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC 2/2: merge the OPAL SOS development branch into trunk
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2010-05-14 12:24:35


On May 12, 2010, at 1:07 PM, Abhishek Kulkarni wrote:

> Updated RFC (w/ discussed changes):
>
> ======================================================================
> [RFC 2/2] merge the OPAL SOS development branch into trunk
> ======================================================================
>
> WHAT: Merge the OPAL SOS development branch into the OMPI trunk.
>
> WHY: Bring over some of the work done to enhance error reporting capabilities.
>
> WHERE: opal/util/ and a few changes in the ORTE notifier.
>
> TIMEOUT: May 17, Monday, COB.
>
> REFERENCE BRANCHES: http://bitbucket.org/jsquyres/opal-sos-fixed/
>
> ======================================================================
>
> BACKGROUND:
>
> The OPAL SOS framework tries to meet the following objectives:
>
> - Reduce the cascading error messages and the amount of code needed to
> print an error message.
> - Build and aggregate stacks of encountered errors and associate
> related individual errors with each other.
> - Allow registration of custom callbacks to intercept error events.
>
> The SOS system provides an interface to log events of varying
> severities. These events are associated with an "encoded" error code
> which can be used to refer to stacks of SOS events. When logging
> events, they can also be transparently relayed to all the activated
> notifier components.
>
> The SOS system is described in detail on this wiki page:
>
> http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
> https://svn.open-mpi.org/trac/ompi/attachment/wiki/ErrorMessages/OPAL_SOS.pdf
>
> CHANGES (since the last RFC):
>
> * Wrapped all hard-coded error-code checks (OMPI_ERR_* == ret),
> OPAL_SOS_GET_ERR_CODE(ret). There were about 30-40 such checks
> each in the OMPI and ORTE layer and about 15 in the OPAL layer.
> Since OPAL_SUCCESS is preserved by SOS, also changed calls of
> the form (OPAL_SUCCESS != ret) to (OPAL_ERROR == ret).

You mean the other way around, right?
You changed code that previously looked like (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret) where appropriate.

>
> * If the error is an SOS-encoded error, ORTE_ERROR_LOG decodes
> the error, prints out the error stack and frees the errors.
>
> ======================================================================
>
>
> On Mar 29, 2010, at 10:58 AM, Abhishek Kulkarni wrote:
>
>>
>> ======================================================================
>> [RFC 2/2]
>> ======================================================================
>>
>> WHAT: Merge the OPAL SOS development branch into the OMPI trunk.
>>
>> WHY: Bring over some of the work done to enhance error reporting capabilities.
>>
>> WHERE: opal/util/ and a few changes in the ORTE notifier.
>>
>> TIMEOUT: April 6, Wednesday, COB.
>>
>> REFERENCE BRANCHES: http://bitbucket.org/jsquyres/opal-sos-fixed/
>>
>> ======================================================================
>>
>> BACKGROUND:
>>
>> The OPAL SOS framework tries to meet the following objectives:
>>
>> - Reduce the cascading error messages and the amount of code needed to
>> print an error message.
>> - Build and aggregate stacks of encountered errors and associate
>> related individual errors with each other.
>> - Allow registration of custom callbacks to intercept error events.
>>
>> The SOS system provides an interface to log events of varying
>> severities. These events are associated with an "encoded" error code
>> which can be used to refer to stacks of SOS events. When logging
>> events, they can also be transparently relayed to all the activated
>> notifier components.
>>
>> The SOS system is described in detail on this wiki page:
>>
>> http://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages
>>
>> Feel free to comment and/or provide suggestions.
>>
>> ======================================================================
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel