Am 11.09.2009 um 12:14 schrieb Jeff Squyres:
> On Sep 10, 2009, at 9:42 PM, Ashika Umanga Umagiliya wrote:
>> That fixed the problem !
>> You are indeed a voodoo master... could you explain the spell behind
>> your magic :)
> The problem has to do with how plugins (aka dynamic shared objects,
> DSO's) are loaded. When a DSO is loaded into a Linux process, it
> has the option of making all the public symbols in that DSO public
> to the rest of the process or private within its own scope.
> Let's back up. Remember that Open MPI is based on plugins
> (DSO's). It loads lots and lots of plugins during execution
> (mostly during MPI_INIT). These plugins call functions in OMPI's
> public libraries (e.g., they call functions in libmpi.so). Hence,
> when the plugin DSO's are loaded, they need to be able to resolve
> these symbols into actual code that can be invoked. If the symbols
> cannot be resolved, the DSO load fails.
> If libParallel.so is loaded into a private scope, then its linked
> libraries (e.g., libmpi.so) are also loaded into that same private
> scope. Hence, all of libmpi.so's public symbols are only public
> within that single, private scope. Then, when OMPI goes to load
> its own DSOs, since libmpi.so's public symbols are in a private
> scope, OMPI's DSO's can't find them -- and therefore they refuse to
> load. (private scopes are not inherited -- a new DSO load cannot
> "see" libParallel.so/libmpi.so's private scope).
> It's an educated guess from your description that this is what was
> OMPI's --disable-dlopen configure option has Open MPI build in a
> different way.
Aha - this might also explain what I faced some time ago. I tried to
compile an application called Molpro with GlobalArrays which I
compiled with Open MPI. I faced similar errors - the compilation
worked without any problem, but I couldn't run the application, as it
resulted in a similar error. Finally I gave up and stayed with mpich
(1) for this.
I will try to build it with this switch in the next days - maybe it
will also solve this issue.
> Instead of building all of OMPI's plugins as DSOs, they are
> "slurped" up into libmpi.so (etc.). So there's no "loading" of
> DSOs at MPI_INIT time -- the plugin code actually resides *in*
> libmpi.so itself. Hence, resolution of all symbols is done when
> libParallel.so loads libmpi.so. Additionally, there's no secondary
> private scope created when DSOs are loaded -- they're all self-
> contained within libmpi.so (etc.). And therefore all the libmpi.so
> symbols that are required for the plugins are all able to be found/
> resolved at load time.
> Does that make sense?
>> Jeff Squyres wrote:
>> > I'm guessing that this has to do with deep, dark voodoo involved
>> > the run time linker.
>> > Can you try configuring/building Open MPI with --disable-dlopen
>> > configure option, and rebuilding your libParallel.so against the
>> > libmpi.so?
>> > See if that fixes the problem for you. If it does, I can
>> explain in
>> > more detail (if you care).
>> > On Sep 10, 2009, at 3:24 AM, Ashika Umanga Umagiliya wrote:
>> >> Greetings all,
>> >> My parallel application is build as a shared library
>> >> (I use Debian Lenny 64bit).
>> >> A webservice is used to dynamically load libParallel.so and
>> >> execute the parallel process .
>> >> But during runtime I get the error :
>> >> webservicestub: symbol lookup error:
>> >> /usr/local/lib/openmpi/mca_paffinity_linux.so: undefined symbol:
>> >> mca_base_param_reg_int
>> >> which I cannot figure out.I followed every 'ldd' and 'nm' seems
>> >> everything is fine.
>> >> So I compiled and tested my parallel code as an executable and
>> then it
>> >> worked fine.
>> >> What could be the reason for this?
>> >> Thanks in advance,
>> >> umanga
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> users mailing list
> Jeff Squyres
> users mailing list