Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Continued functionality across a SLES10 to SLES11 upgrade ...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-09-22 09:07:53


On Sep 20, 2010, at 1:20 PM, Richard Walsh wrote:

> I was not expecting things to work, and find that codes compiled using
> OpenMPI 1.4.1 commands under SLES 10.2 produce the following message
> when run under SLES11:
>
> mca: base: component_find: unable to open /share/apps/openmpi-intel/1.4.1/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
>
> This file is in position and is NOT the result of a faulty mixed-release over-build
> (things work great under SLES10.2).
>
> The message indicates that (as the default is to build OpenMPI dynamically
> with share objects) in loading this required IB-related library there must
> be a format incompatibility. However, I find that if I force the use of GE with:
>
> -mca btl tcp,self
>
> things seem to run OK under SLES 11.
>
> Could someone add some detail here on what, if anything, I can expect to
> work when we try to run old SLES 10.2 build OpenMPI 1.4.1 binaries under
> SLES 11. I would have thought NOTHING, but maybe that is not quite right.

I do not have any experience with SLES, so I can't comment for sure. But I'd *guess* that there was a symbol change between 10.2 and 11 in the OpenFabrics libraries such that the openib BTL is unable to find a symbol that it needs. Another possibility is the dependent libraries of libibverbs.so changed (e.g., perhaps libibverbs.so required -lsysfs in 10.2, but then libsysfs.so doesn't exist in 11...?). Does the SLES release notes say anything about binary compatibility (particularly of the OpenFabrics libraries) between SLES 10.2 and 11?

I'm quite sure that recompiling all of OMPI should make it work -- I'd be very surprised if the OpenFabrics libraries in SLES 11 were inconsistent such that you couldn't just rebuild and have it work.

You may be able to recompile *just the openib BTL module* on SLES 11, drop it in your OMPI 1.4.2 installation, and have it work again. But that's not a guarantee -- other things may have changed such that a recompile may change some struct sizes or somesuch.

Probably your best bet would be:

- investigate if there's a missing symbol or library in the current mca_btl_openib.so (e.g., run nm on mca_btl_openib.so and ensure that all those libraries are present in SLES 11)
    - if it's a missing library, see if you can supply a dummy library to make it work (that may involve a little trickery)
- recompile OMPI 1.4.2 under SLES 11
    - copy in the mca_btl_openib.so from that install to your old OMPI install
    - run some apps and see if it works
    - if it does, relax, have a beer^H^H^H^Hnon-cafinated tea
- if it does not work, you may have to go the recompile-everything route

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/