Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] undefined symbol error when built as a sharedlibrary
From: Reuti (reuti_at_[hidden])
Date: 2009-09-11 07:26:12


Am 11.09.2009 um 12:14 schrieb Jeff Squyres:

> On Sep 10, 2009, at 9:42 PM, Ashika Umanga Umagiliya wrote:
>
>> That fixed the problem !
>> You are indeed a voodoo master... could you explain the spell behind
>> your magic :)
>>
>
> The problem has to do with how plugins (aka dynamic shared objects,
> DSO's) are loaded. When a DSO is loaded into a Linux process, it
> has the option of making all the public symbols in that DSO public
> to the rest of the process or private within its own scope.
>
> Let's back up. Remember that Open MPI is based on plugins
> (DSO's). It loads lots and lots of plugins during execution
> (mostly during MPI_INIT). These plugins call functions in OMPI's
> public libraries (e.g., they call functions in libmpi.so). Hence,
> when the plugin DSO's are loaded, they need to be able to resolve
> these symbols into actual code that can be invoked. If the symbols
> cannot be resolved, the DSO load fails.
>
> If libParallel.so is loaded into a private scope, then its linked
> libraries (e.g., libmpi.so) are also loaded into that same private
> scope. Hence, all of libmpi.so's public symbols are only public
> within that single, private scope. Then, when OMPI goes to load
> its own DSOs, since libmpi.so's public symbols are in a private
> scope, OMPI's DSO's can't find them -- and therefore they refuse to
> load. (private scopes are not inherited -- a new DSO load cannot
> "see" libParallel.so/libmpi.so's private scope).
>
> It's an educated guess from your description that this is what was
> happening.
>
> OMPI's --disable-dlopen configure option has Open MPI build in a
> different way.

Aha - this might also explain what I faced some time ago. I tried to
compile an application called Molpro with GlobalArrays which I
compiled with Open MPI. I faced similar errors - the compilation
worked without any problem, but I couldn't run the application, as it
resulted in a similar error. Finally I gave up and stayed with mpich
(1) for this.

I will try to build it with this switch in the next days - maybe it
will also solve this issue.

-- Reuti

> Instead of building all of OMPI's plugins as DSOs, they are
> "slurped" up into libmpi.so (etc.). So there's no "loading" of
> DSOs at MPI_INIT time -- the plugin code actually resides *in*
> libmpi.so itself. Hence, resolution of all symbols is done when
> libParallel.so loads libmpi.so. Additionally, there's no secondary
> private scope created when DSOs are loaded -- they're all self-
> contained within libmpi.so (etc.). And therefore all the libmpi.so
> symbols that are required for the plugins are all able to be found/
> resolved at load time.
>
> Does that make sense?
>
>
>
>> Regards,
>> umanga
>>
>>
>> Jeff Squyres wrote:
>> > I'm guessing that this has to do with deep, dark voodoo involved
>> with
>> > the run time linker.
>> >
>> > Can you try configuring/building Open MPI with --disable-dlopen
>> > configure option, and rebuilding your libParallel.so against the
>> new
>> > libmpi.so?
>> >
>> > See if that fixes the problem for you. If it does, I can
>> explain in
>> > more detail (if you care).
>> >
>> >
>> > On Sep 10, 2009, at 3:24 AM, Ashika Umanga Umagiliya wrote:
>> >
>> >> Greetings all,
>> >>
>> >> My parallel application is build as a shared library
>> (libParallel.so).
>> >> (I use Debian Lenny 64bit).
>> >> A webservice is used to dynamically load libParallel.so and
>> inturn
>> >> execute the parallel process .
>> >>
>> >> But during runtime I get the error :
>> >>
>> >> webservicestub: symbol lookup error:
>> >> /usr/local/lib/openmpi/mca_paffinity_linux.so: undefined symbol:
>> >> mca_base_param_reg_int
>> >>
>> >> which I cannot figure out.I followed every 'ldd' and 'nm' seems
>> >> everything is fine.
>> >> So I compiled and tested my parallel code as an executable and
>> then it
>> >> worked fine.
>> >>
>> >> What could be the reason for this?
>> >>
>> >> Thanks in advance,
>> >> umanga
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >
>> >
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users