Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: ABI break between 1.4 and 1.5 / .so versioning
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-02-17 17:51:42


On Feb 17, 2010, at 3:05 PM, Ralf Wildenhues wrote:

> > The issue is that if the user has to specify -static to their linker,
> > they *also* have to specify --ompi:static, or Bad Things will happen.
> > Or, if they don't specify -static but *only* specify --ompi:static,
> > Bad Things will happen. In short: it seems like adding yet another
> > wrapper-compiler-specific flag to the MPI ecosystem will cause
> > confusion, fear, and possibly the death of some cats.
>
> Do you care for omitting -lopen-pal and -lorte only for capable Linux
> systems? With new-enough binutils, you should be able to use
> -Wl,--as-needed -Wl,--no-as-needed around these two libs.

Mmmm. Good point. But I don't think it helps us on Solaris or OS X, does it? (maybe it does on OS X?) Or do all linkers have some kind of option like this? (this *might* be a way out, but I would probably need to be convinced :-) )

> I'm not entirely sure I understand your argumentation for why libmpi
> from 1.5.x has to be binary incompatible, but I haven't fully thought
> through this yet.

The context for this issue is so long that much was left out of my mail. Here's this particular issue in a nutshell:

- Open MPI v1.4.1 has libmpi at 0:1:0 and libopen-rte and libopen-pal both at 0:0:0.
- Open MPI v1.4.1 links MPI apps against -lmpi -lopen-rte -lopen-pal.
- If we start .so versioning properly in v1.5, it's likely that libopen-rte and libopen-pal will both be 1:0:0.
  --> Note that these are both internal libraries; there are no symbols in these libraries that are used in the MPI applications.
- Open MPI v1.5 libmpi *could* be 1:0:1.
- Hence, an a.out created for OMPI v1.4.1 would work fine with v1.5 libmpi.
- But that a.out would not work with v1.5 libopen-rte and libopen-pal.

The problem is that our internal APIs change not infrequently, and potentially in incompatible ways. This shouldn't (doesn't) matter to MPI applications, but because we "-lmpi -lopen-rte -lopen-pal" even for shared library linking, the linker thinks that it *does* matter because we've established an explicit dependency from a.out to all 3 libraries.

My initial idea was to add special flags to the wrapper compilers that the user would use to indicate whether it should be "-lmpi" (shared link) or "-lmpi -lopen-rte -lopen-pal" (static link). Brian hates this. :-)

Brian's idea is to make libmpi.la slurp up libopen-rte.la as a convenience library. Similarly, have libopen-rte.la slurp up libopen-pal.la as a convenience library. Hence, only -lmpi is needed regardless of whether you're linking statically or dynamically.

Regardless of which way we go, if we start .so versioning libopen-rte and libopen-pal in v1.5, ABI will break between v1.4 and v1.5. We *do* need to fix the .so versioning issues of libopen-rte and libopen-pal; if we don't do it for v1.5.0, our next opportunity will be to do it in v1.7 (which is quite a long time off) because I refuse to do this size of a change in the middle of a release series. All we'll have done is put off the pain until later.

Hopefully, that made sense. :-)

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/