Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Continued functionality across a SLES10 to SLES11 upgrade ...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-09-22 09:07:53

On Sep 20, 2010, at 1:20 PM, Richard Walsh wrote:

> I was not expecting things to work, and find that codes compiled using
> OpenMPI 1.4.1 commands under SLES 10.2 produce the following message
> when run under SLES11:
> mca: base: component_find: unable to open /share/apps/openmpi-intel/1.4.1/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
> This file is in position and is NOT the result of a faulty mixed-release over-build
> (things work great under SLES10.2).
> The message indicates that (as the default is to build OpenMPI dynamically
> with share objects) in loading this required IB-related library there must
> be a format incompatibility. However, I find that if I force the use of GE with:
> -mca btl tcp,self
> things seem to run OK under SLES 11.
> Could someone add some detail here on what, if anything, I can expect to
> work when we try to run old SLES 10.2 build OpenMPI 1.4.1 binaries under
> SLES 11. I would have thought NOTHING, but maybe that is not quite right.

I do not have any experience with SLES, so I can't comment for sure. But I'd *guess* that there was a symbol change between 10.2 and 11 in the OpenFabrics libraries such that the openib BTL is unable to find a symbol that it needs. Another possibility is the dependent libraries of changed (e.g., perhaps required -lsysfs in 10.2, but then doesn't exist in 11...?). Does the SLES release notes say anything about binary compatibility (particularly of the OpenFabrics libraries) between SLES 10.2 and 11?

I'm quite sure that recompiling all of OMPI should make it work -- I'd be very surprised if the OpenFabrics libraries in SLES 11 were inconsistent such that you couldn't just rebuild and have it work.

You may be able to recompile *just the openib BTL module* on SLES 11, drop it in your OMPI 1.4.2 installation, and have it work again. But that's not a guarantee -- other things may have changed such that a recompile may change some struct sizes or somesuch.

Probably your best bet would be:

- investigate if there's a missing symbol or library in the current (e.g., run nm on and ensure that all those libraries are present in SLES 11)
    - if it's a missing library, see if you can supply a dummy library to make it work (that may involve a little trickery)
- recompile OMPI 1.4.2 under SLES 11
    - copy in the from that install to your old OMPI install
    - run some apps and see if it works
    - if it does, relax, have a beer^H^H^H^Hnon-cafinated tea
- if it does not work, you may have to go the recompile-everything route

Jeff Squyres
For corporate legal information go to: