Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Missing Symbol
From: Leonardo Fialho (leonardofialho_at_[hidden])
Date: 2010-03-06 04:03:35


Terry, the mca_pml_v is declared in a .so, and at loading time it should export the symbol. But, this component load another modules like mca_vprotocol_pessimist or mca_vprotocol_receiver (in my case). The symbol is declared on the pml_v.c which acts as a pseudo-framework loading other components, vprotocol_pessimist for example.

As George said the problem is that as mca_pml_v is dynamically loaded and then it loads mca_vprotocol_receiver which uses the problematic symbol. The symbol should be available in global symbols. I don't know why, but it is not occurring.

Jeff, it is really good to have a better output for those kind of errors, but it does not change the problem. I think that the vprotocol is the only component which load other components in this way. But, all components are loaded by libopal in the same way, no?

Leonardo

On Mar 6, 2010, at 12:27 AM, Jeff Squyres wrote:

> We already use global symbols; mca_base_component_repository.c invokes:
>
> if (lt_dladvise_global(&opal_mca_dladvise)) {
> return OPAL_ERROR;
> }
>
>
> On Mar 5, 2010, at 6:18 PM, George Bosilca wrote:
>
>> Unfortunately this will not fix his issues ;( I pretty sure that his problem is related to the fact that mca_pml_v is exported by another dynamic module, and therefore not available via dlsym. I don't think there is a simple solution for this problem, except going back to GLOBAL symbols.
>>
>> george.
>>
>> On Mar 5, 2010, at 18:02 , Jeff Squyres wrote:
>>
>>> Ick.
>>>
>>> I wondered aloud on IM to Terry after your earlier emails if we should just custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is effectively reporting the "wrong" error back to OMPI, so the error string that we get to print out ends up not being very useful (e.g., not showing which symbol was missing, or what the problem was with the dlopen). Fixing this properly in libltdl is actually somewhat tricky -- which is why it hasn't been fixed yet. But given that OMPI's use of libltdl is pretty specific, we might be able to get away with a simple fix that works just for OMPI (but wouldn't necessarily be suitable for all other libltdl users).
>>>
>>> Hmmm...
>>>
>>> This looks do-able. I'll commit in a bit.
>>>
>>>
>>>
>>> On Mar 5, 2010, at 1:27 PM, Leonardo Fialho wrote:
>>>
>>>> I see... but it is really strange because this module is clean, it does not use nothing. This is the output of the nm command, I can't see any symbol which is not available.
>>>>
>>>> [lfialho_at_aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so
>>>> 0000000000201208 a _DYNAMIC
>>>> 0000000000201408 a _GLOBAL_OFFSET_TABLE_
>>>> w _Jv_RegisterClasses
>>>> 00000000002011e0 d __CTOR_END__
>>>> 00000000002011d8 d __CTOR_LIST__
>>>> 00000000002011f0 d __DTOR_END__
>>>> 00000000002011e8 d __DTOR_LIST__
>>>> 00000000000011d0 r __FRAME_END__
>>>> 00000000002011f8 d __JCR_END__
>>>> 00000000002011f8 d __JCR_LIST__
>>>> 0000000000201640 A __bss_start
>>>> w __cxa_finalize@@GLIBC_2.2.5
>>>> 0000000000000d40 t __do_global_ctors_aux
>>>> 00000000000007c0 t __do_global_dtors_aux
>>>> 0000000000201200 d __dso_handle
>>>> w __gmon_start__
>>>> 0000000000201640 A _edata
>>>> 0000000000201648 A _end
>>>> 0000000000000d78 T _fini
>>>> 0000000000000750 T _init
>>>> 00000000000007a0 t call_gmon_start
>>>> 0000000000201640 b completed.6115
>>>> 0000000000000810 t frame_dummy
>>>> U mca_pml_v
>>>> 0000000000201460 D mca_vprotocol_receiver
>>>> 0000000000000c71 t mca_vprotocol_receiver_add_comm
>>>> 0000000000000a5f t mca_vprotocol_receiver_add_procs
>>>> 0000000000201540 D mca_vprotocol_receiver_component
>>>> 0000000000000cc3 t mca_vprotocol_receiver_component_close
>>>> 0000000000000d18 t mca_vprotocol_receiver_component_finalize
>>>> 0000000000000cce t mca_vprotocol_receiver_component_init
>>>> 0000000000000cb8 t mca_vprotocol_receiver_component_open
>>>> 0000000000000c93 t mca_vprotocol_receiver_del_comm
>>>> 0000000000000a89 t mca_vprotocol_receiver_del_procs
>>>> 000000000000083c t mca_vprotocol_receiver_dump
>>>> 0000000000000d23 t mca_vprotocol_receiver_enable
>>>> 00000000000009e7 t mca_vprotocol_receiver_iprobe
>>>> 0000000000000b9a t mca_vprotocol_receiver_irecv
>>>> 0000000000000ab3 t mca_vprotocol_receiver_isend
>>>> 0000000000000a29 t mca_vprotocol_receiver_probe
>>>> 0000000000000c00 t mca_vprotocol_receiver_recv
>>>> 0000000000000b21 t mca_vprotocol_receiver_send
>>>> 00000000000009bd T mca_vprotocol_receiver_start
>>>> 0000000000000864 t mca_vprotocol_receiver_test
>>>> 0000000000000896 t mca_vprotocol_receiver_test_all
>>>> 00000000000008d0 t mca_vprotocol_receiver_test_any
>>>> 0000000000000950 t mca_vprotocol_receiver_test_some
>>>> 0000000000000916 t mca_vprotocol_receiver_wait_any
>>>> 000000000000098a t mca_vprotocol_receiver_wait_some
>>>> U ompi_request_null
>>>> U opal_output
>>>> 0000000000201440 d p.6113
>>>> [lfialho_at_aoclsb-clus openmpi]$
>>>>
>>>> On Mar 5, 2010, at 7:00 PM, Terry Dontje wrote:
>>>>
>>>>> Sorry meant to add this, but you might be able to try and find the symbol causing the issue by twiddling with LD_DEBUG
>>>>>
>>>>> --td
>>>>> Terry Dontje wrote:
>>>>>> Possibly there is an external symbol in the .so that is being loaded that cannot be resolved.
>>>>>> --td
>>>>>> Leonardo Fialho wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I know that libtool does not help us to find the source of this error, but, what can generate the following error?
>>>>>>>
>>>>>>> [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
>>>>>>>
>>>>>>> 1) yes, the file exists
>>>>>>> 2) yes, it has been compiled among all other components
>>>>>>> 3) yes, it is the same Open MPI version
>>>>>>> 4) this component is a copy of the pessimist component implemented by Aurelien
>>>>>>> 5) Aurelien's component presents the same error
>>>>>>>
>>>>>>> The question is: what mistake should generate an error during module loading?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>> Leonardo
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel