Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: MCA param registration errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-10-31 16:16:37


WHAT: what to do if registering an MCA param results in an error?

WHERE: opal/mca/base/mca_base_param.c

WHY: MCA param re-registration issues should be treated as OMPI developer errors

WHEN: COB Friday, 4 Nov 2011

-----------------

Short version:

Re-registering an MCA param to be a different type (e.g., it was initially registered to be a string, but was later re-registered to be an int) should be treated as an OMPI developer error, and should opal_finalize()/exit(1).

More details:

A mistaken MCA param re-registration recently caused an orted segv.

The MCA param subsystem was fixed to avoid this segv, but silently convert the MCA param to the newly-registered type. Upon reflection and some discussion, this seems to be a bad idea. Instead, we should loudly complain via a show_help message and then exit(1).

Specifically: this kind of behavior is clearly an error and should be fixed. Unfortunately, in most cases, we don't actually check the return value from MCA param registration functions, so if we change the MCA param function to simply return a non OPAL_SUCCESS status, it's unlikely that anyone will notice until some code tries to read the param value, likely still resulting in a segv.

Does anyone have heartburn if I change the error behavior to opal_finalize()/exit(1)?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/