Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-02-17 13:23:26


WHAT: Break ABI between 1.4 and 1.5 series.

WHY: To settle the ABI and .so versioning issues once and for all.

WHERE: Open MPI's .so versions and the opal_wrapper compiler.

WHEN: For 1.5[.0]. This is only meaningful if we do it for the *entire* v1.5 series.

TIMEOUT: Next Tuesday teleconf, 23 Feb 2010

=======================================================

BACKGROUND / REQUIRED READING:
------------------------------

 * Ticket 2092: https://svn.open-mpi.org/trac/ompi/ticket/2092
 * Libtool .so versioning rules: https://svn.open-mpi.org/trac/ompi/wiki/ReleaseProcedures

Libtool .so version numbers are expressed as c:r:a. libmpi is currently versioned "correctly", meaning that we advance the c:r:a triple as necessary for each release. libopen-pal and libopen-rte, however, are currently fixed at 0:0:0, which is Wrong. The reasons why they are fixed at 0:0:0 are expressed in #2092.

SHORT VERSION OF THIS PROPOSAL:
-------------------------------

 * For v1.5.0, set c:r:a of libmpi to 1:0:0.
 * Starting with v1.5.0, set c:r:a for libopen-rte and libopen-al properly.
 * This means a break in ABI between v1.4.x and v1.5.x, but the ABI will remain constant for all of 1.5.x/1.6.x.
 * The wrapper compilers will need to be updated to recognize the difference between static and dynamic linking.

LONGER VERSION / MORE DETAILS AND RATIONALE:
--------------------------------------------

The fix for these issues involves several dominos falling in order. You need to read this whole proposal to understand the full scope, sorry. :-\

1. We need to fix the wrapper compilers to recognize the difference between shared library linking and static linking. Right now, the MPI wrappers always do this:

    -lmpi -lopen-rte -lopen-pal

2. Listing all three libraries is only necessary when linking statically. When linking dynamically, only the top-level library should be listed (e.g., -lmpi for MPI applications). The implicit linker dependencies of libmpi.so will automatically pull in libopen-rte.so. Likewise, the implicit dependencies of libopen-rte.so will automatically pull in libopen-pal.so. More specifically, when linking dynamically, MPI a.out applications will only explicitly depend on libmpi.so (not libopen-rte.so and not libopen-pal.so).

3. Hence, the wrappers need to learn the difference between static and dynamic linking: when linking dynamically, only list "-lmpi". When linking statically, list all 3 libraries. This allows minimization of explicit library dependencies in dynamic linking, and is arguably the Right way to do it.

--> More below about how to make the wrappers understand the difference between static/shared linking.

4. When MPI applications only depend on libmpi, we can properly version libopen-rte.so and libopen-pal.so. Hence, for v1.5.0, we will have non-0:0:0 .so versions for these two libraries.

5. Since MPI application a.out's created by the v1.4 series will have explicit dependencies on all 3 libraries, they will be ABI incompatible with Open MPI v1.5's ORTE and OPAL libraries (as opposed to MPI applications created with updated wrappers in v1.5, which will only depend on libmpi when linking dynamically).

6. The question then remains: what to set libmpi.so's c:r:a values in v1.5.0? I say it should be 1:0:0. Here's why:
  * Recall that we have added some new MPI-2.2 functions in v1.5. Hence, libmpi.so's "c" needs to increase to 1 and "r" needs to be set to 0. The questions is what to do with the "a" value.
  * By extension of #5, we should also make libmpi.so be ABI incompatible between v1.4.x and v1.5.x (to prevent some needless confusion -- rather than have libmpi be ABI compatible and libopen-rte and libopen-pal *not* be ABI compatible, I think it would be better to make *all 3* be ABI incompatible). This means setting the libmpi.so "a" value to 0 (as opposed to setting it to 1).

Crystal clear? I thought so. :-)

------

Here's my proposal on how to change the wrapper compilers to understand the difference between static and dynamic linking:

*** FIRST: give the wrapper the ability to link one library or all libraries
- wrapper data text files grow a new option: libs_private (a la pkg-config(1) files)
- wrapper data text files list -l<top_lib> in libs, and everything else in libs_private. For example, for mpicc:
  libs=-lmpi
  libs_private=-lopen-rte -lopen-pal

*** NEXT: give the wrappers the ability to switch between just ${libs} or ${libs}+${libs_private}. Pseudocode:
- wrapper always adds ${libs} to the argv
- wrapper examines each argv[x]:
  --ompi:shared) found_in_argv=1 ;;
  --ompi:static) add ${libs_private} ; found_in_argv=1 ;;
- if (!found_in_argv)
  - if default set via configure, add ${libs_private} (SEE BELOW)

*** LAST: give sysadmin ability to set wrapper behavior defaults
- if --disable-shared is set in OMPI's configure, wrappers default to adding both ${libs} and ${libs_private}
- new configure option: --enable-wrapper-static-link-by-default (or some better name) which forces wrappers to default add ${libs} and ${libs_private} (--disable... does the opposite)

Note that per above, wrapper command line options always override configure defaults.

This is not entirely perfect, for the following reasons:

1. sysadmins may have to specify a new option to configure (only if they build both static and shared and want users to default to static)
2. two new options to the wrappers
3. you can still get in a situation where the wrapper will fail (e.g., wrapper only uses ${libs}, but only the .a's exist, and therefore the link fails)

I think #1 and #2 are tolerable.

I can't think of a reasonable case where #3 can occur without someone mucking with an already-installed OMPI (e.g., "rm $prefix/lib/libmpi.so"). The only case I can think of where this *might* happen is with RPMs -- ompi (which has libmpi.so) and ompi-devel (which has libmpi.a). ompi-devel depends on ompi, so you couldn't remove the ompi RPM (libmpi.so) and only leave the ompi-devel RPM (libmpi.a). Hence, I even think #3 is tolerable.

Thoughts? Opinions? Need caffeine? WAKE UP! The proposal's over. ;-)

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/