Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problems in 1.3 loading shared libs when using VampirServer
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-23 20:59:33

Err... I'm a little confused. We've been emailing about this exact
issue for a week or two (off list); you just re-started the
conversation from the beginning, moved it to the user's list, and
dropped all the CC's (which include several people who are not on this
list). Why did you do that?

Here's what I said in my last mail on that thread (just a few hours
ago); it was in response to a mail from Thomas:

I am totally confused by your explanation; you are throwing around
terms like VampirServer, vgnd, driver, ... I don't know what these
things are nor do I understand your explanation of how they relate to
each other. You seem to be using terms to define other terms that
then are used to define the original terms. This is where I get lost.

Can you send a simple example that doesn't work, preferably outside of
the whole Vampir system? Perhaps something that effectively mimics
Vampir's behavior?

On Feb 4, 2009, at 12:03 PM, Kiril Dichev wrote:

> Hi guys,
> sorry for the long e-mail.
> I have been trying for some time now to run VampirServer with shared
> libs for Open MPI 1.3.
> First of all: The "--enable-static --disable-shared" version works.
> Also, the 1.2 series worked fine with the shared libs.
> But here is the story for the shared libraries with OMPI 1.3:
> Compilation of OMPI went fine and also the VampirServer guys compiled
> the MPI driver they need against OMPI. The driver just refers to the
> shared libraries of Open MPI.
> However, on launching the server I got errors of the type "undefined
> symbol":
> error: /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-
> intel10.1-64bit-MT-shared/lib/openmpi/
> undefined symbol: mca_base_param_reg_int
> It seemed to me that probably my LD_LIBRARY_PATH is not including
> <MPI_INSTALL>/lib/openmpi , but I exported it and did "mpirun -x
> LD_LIBRARY_PATH ..." and nothing changed.
> Then, I started building any component complaining with "undefined
> symbol" with "--enable-mca-static" - for example the above message
> disappeared after I did --enable-mca-static paffinity. I don't know
> why
> this worked, but it seemed to help. However, it was always replaced by
> another error message of another component.
> After a few components another error came
> mca: base: component_find: unable to
> open /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-
> intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob: file not found
> (ignored)
> (full output attached)
> Now, I was unsure what to do, but again, when compiling the
> complaining
> component statically, things went a step further. One thing that
> struck
> me is that there is such a file with an extra ".so" at the end in the
> directory -but maybe dlopen also accepts files without the ".so", I
> don't know.
> Anywas, now I have included like 20 components statically and still
> build shared objects for the OMPI libs and things seem to work.
> Does anyone have any idea why these dozens of errors happen when
> loading
> shared libs? Like I said, I never had this in 1.2 series.
> Thanks,
> Kiril
> <mpirun-vngd.out>_______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems