Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] unresolved symbol mca_base_param_reg_int
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-04-26 14:43:35

On Apr 24, 2010, at 10:14 PM, Nev wrote:

> void * const result = dlopen(libName, RTLD_LAZY | RTLD_LOCAL);

This line is the problem: change RTLD_LOCAL to RTLD_GLOBAL and it'll work. There's another option, too -- keep reading...

<highly complex linker voodoo>

Before discussing why this happens, know that Open MPI plugins call functions back up in the main Open MPI libraries. As a crass-and-not-really-correct-but-close-enough example, consider that OMPI plugins are created (sorta) like this:

    gcc my_plugin_source.c ... -L<dir> -lmpi --shared -o

where is a shared library. These plugins are not making MPI standardized API function calls; they're calling internal functions inside (i.e., OMPI's internal implementation API). This is because (and friends) have a whole lotta infrastructure that the plugins need in order to be able to do their work.

It's a fun use of the intelligence of linkers -- a normal MPI app is linked against OMPI's, but so is When your app calls MPI_Init, the normal run-time linker semantics take over, resolve the symbol, and then call it. Later, is dlopen()'ed. The run-time linker sees that it needs, but realizes that is already loaded -- so it doesn't load it again. When calls OMPI_do_something(), the same run-time resolution occurs, and (this is key) it calls the function in the same instance of that your app is using.

Nifty. Without this concept, OMPI's plugin concept wouldn't work.

Your code is dlopening liba2lib as LOCAL. The run-time linker pulls in at the same time as liba2lib (because MPI_Init needs it) -- and therefore is loaded into the same private space as liba2lib. But then later, the innards of Open MPI dlopen() This plugin is loaded into a DIFFERENT symbol space than The key point here is that LOCAL is not "inherited", so to speak. If you dlopen() libfoo as LOCAL, if libfoo then dlopen()s more DSOs, those newly-opened DSOs are in a different space than libfoo.

The best I can guess is that when is dlopen()'ed, the linker says "ya, we have loaded" and it allows the load to complete successfully. But later when it tries to actually resolve OMPI_do_something(), it fails -- because OMPI_do_something() is in the private/LOCAL symbol space. And therefore OMPI_do_something has a value of 0. And it segv's when we try to call through it. (this paragraph may not be exactly right; but it's probably close -- every time I think I understand linkers, I find out that I don't understand them at all...)

It works for you in the static case because Open MPI slurps up all the components *into* in that case. Hence, all the components *and* all the internal libmpi symbols are loaded into the same LOCAL symbol space. There's no dlopen'ing of plugins in this case. And it all works fine because everything can resolve nicely, yadda yadda yadda.

So I think your options are 1) to change that LOCAL to GLOBAL, 2) use "--enable-static --disable-shared", or 3) use --disable-dlopen. #2 builds libmpi.a *and* slurps all of OMPI's components up into libmpi.a. #3 builds *and* slurps all of OMPI's components up into So you get the benefits of a shared library, but all the components are physically inside as opposed to being standalone DSO's.

</highly complex linker voodoo>

I hope that made sense!

Jeff Squyres
For corporate legal information go to: