I propose that we retain the rest of the changeset, but revert the OMPI constants to bring back their ORTE equivalents. We clearly should scrub those and update them to ensure they are both used and current, but it seems to me we lose more than we gain by removing them.
On Oct 19, 2011, at 12:09 PM, Jeff Squyres wrote:
> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error codes. That seems like a very bad idea (in addition to the mixing of + and -).
> For one thing, that breaks opal_strerror(). That, in itself, seems like a dealbreaker.
> On Oct 19, 2011, at 1:51 PM, Barrett, Brian W wrote:
>> I actually think it's worse than that. An ORTE error code can now have
>> the same error code as an OMPI error. OMPI_ERR_REQUEST and
>> ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code.
>> Or, they should, if George hadn't made a mistake (see below). The sharing
>> of return codes seems... bad.
>> Also, there's a bug in George's patch. Error codes are all negative, so
>> OMPI_ERR_REQUEST should be OMPI_ERR_BASE -1 and OMPI_ERR_MAX should be
>> OMPI_ERR_BASE - 1, not plus 2.
>> On 10/19/11 1:32 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>> I've been wrestling with something from this commit, and I'm unsure of
>>> the right answer. So please consider this a general design question for
>>> the community.
>>> This commit removes all the OMPI <-> ORTE equivalent constants - i.e., we
>>> used to declare OMPI-prefixed equivalents to every ORTE-prefixed
>>> constant. I understand the thinking (or at least, what I suspect was the
>>> thought), but it creates an issue.
>>> Suppose I have an ompi-level function (A) that calls another ompi-level
>>> function (B). Invisible to A is that B calls an orte-level function. B
>>> dutifully checks the error return from the orte-level function against an
>>> ORTE-prefixed constant.
>>> However, if that return isn't "success", what does B return up to A? It
>>> cannot return the OMPI equivalent to the orte error constant because it
>>> no longer exists. It could return the orte error code, but A has no way
>>> of knowing it is going to get a non-OMPI constant, and therefore won't be
>>> able to understand it - it will be an "unrecognized error".
>>> I guess one option is to require that B "translate" the return code and
>>> pass some OMPI error up the chain, but this prevents anything upwards
>>> from understanding the nature of the problem and potentially taking
>>> corrective and/or alternative action. Seems awfully limiting, as most of
>>> the time the only option will be the vanilla "OMPI_ERROR".
>> Brian W. Barrett
>> Dept. 1423: Scalable System Software
>> Sandia National Laboratories
>> devel mailing list
> Jeff Squyres
> For corporate legal information go to:
> devel mailing list