Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Missing Symbol
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-03-05 14:40:53


Have you found the symbol being exposed by another .so (ie have you done
an nm on the .so that shows the symbol)? And are you sure that .so is
loaded by the time your .so is being loaded?

--td
Leonardo Fialho wrote:
> No George, this trick does not change the problem. I'm looking for the problem in the mca_pml_v declaration, but I still can't figure out the reason why it doesn't work.
>
> Leonardo
>
> On Mar 5, 2010, at 8:12 PM, George Bosilca wrote:
>
>
>> I would first try the Open MPI configure option --disable-visibility. If this doesn't fix it, you should make sure that dlopen is called with the GLOBAL flag on (don't remember where exactly in the code and unfortunately I can't check right now). Use gdb and set a breakpoint to dlopen and you will find it.
>>
>> george.
>>
>> On Mar 5, 2010, at 14:00 , Leonardo Fialho wrote:
>>
>>
>>> Yeah, probably ompi_request_null and opal_output are not good candidates. I'm trying with mca_pml_v. But I'm not familiarized with this framework although it is really small.
>>>
>>> George, you said to change this (opal/mca/base/mca_base_component_find.c):
>>>
>>> #if OPAL_HAVE_LTDL_ADVISE
>>> component_handle = lt_dlopenadvise(target_file->filename, opal_mca_dladvise);
>>> #else
>>> component_handle = lt_dlopenext(target_file->filename);
>>> #endif
>>>
>>> to use lt_dladvise_global instead of lt_dladvise_local?
>>>
>>> Leonardo
>>>
>>> On Mar 5, 2010, at 7:47 PM, Terry Dontje wrote:
>>>
>>>
>>>> I would also start nm'ing the .so's you think the U symbols are resolved in to make sure they are exposed. Luckily you only have 3 symbols to look for.
>>>>
>>>> --td
>>>>
>>>> Ralph Castain wrote:
>>>>
>>>>> It's probably a visibility issue - check for an OMPI_DECLSPEC missing from the declaration of a symbol.
>>>>>
>>>>> On Mar 5, 2010, at 11:40 AM, Leonardo Fialho wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Yes,
>>>>>>
>>>>>> I renamed all references to Aurelien's componant name and removed all code regarding to the component itself. There are only functions which returns OMPI_SUCCESS. No other function is called.
>>>>>>
>>>>>> I'm debugging with LD_DEBUG=symbols, but the output is really huge! Probably the error is in the mca_pml_v symbol:
>>>>>>
>>>>>> 19643: /home/lfialho/lib/openmpi/mca_vprotocol_receiver.so: error: symbol lookup error: undefined symbol: mca_pml_v (fatal)
>>>>>>
>>>>>> Leonardo
>>>>>>
>>>>>> On Mar 5, 2010, at 7:35 PM, Ralph Castain wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> You said this component was a copy of Aurelien's component? Did you rename the critical elements (e.g., component, module) inside it to avoid name confusion?
>>>>>>>
>>>>>>> On Mar 5, 2010, at 11:27 AM, Leonardo Fialho wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I see... but it is really strange because this module is clean, it does not use nothing. This is the output of the nm command, I can't see any symbol which is not available.
>>>>>>>>
>>>>>>>> [lfialho_at_aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so 0000000000201208 a _DYNAMIC
>>>>>>>> 0000000000201408 a _GLOBAL_OFFSET_TABLE_
>>>>>>>> w _Jv_RegisterClasses
>>>>>>>> 00000000002011e0 d __CTOR_END__
>>>>>>>> 00000000002011d8 d __CTOR_LIST__
>>>>>>>> 00000000002011f0 d __DTOR_END__
>>>>>>>> 00000000002011e8 d __DTOR_LIST__
>>>>>>>> 00000000000011d0 r __FRAME_END__
>>>>>>>> 00000000002011f8 d __JCR_END__
>>>>>>>> 00000000002011f8 d __JCR_LIST__
>>>>>>>> 0000000000201640 A __bss_start
>>>>>>>> w __cxa_finalize@@GLIBC_2.2.5
>>>>>>>> 0000000000000d40 t __do_global_ctors_aux
>>>>>>>> 00000000000007c0 t __do_global_dtors_aux
>>>>>>>> 0000000000201200 d __dso_handle
>>>>>>>> w __gmon_start__
>>>>>>>> 0000000000201640 A _edata
>>>>>>>> 0000000000201648 A _end
>>>>>>>> 0000000000000d78 T _fini
>>>>>>>> 0000000000000750 T _init
>>>>>>>> 00000000000007a0 t call_gmon_start
>>>>>>>> 0000000000201640 b completed.6115
>>>>>>>> 0000000000000810 t frame_dummy
>>>>>>>> U mca_pml_v
>>>>>>>> 0000000000201460 D mca_vprotocol_receiver
>>>>>>>> 0000000000000c71 t mca_vprotocol_receiver_add_comm
>>>>>>>> 0000000000000a5f t mca_vprotocol_receiver_add_procs
>>>>>>>> 0000000000201540 D mca_vprotocol_receiver_component
>>>>>>>> 0000000000000cc3 t mca_vprotocol_receiver_component_close
>>>>>>>> 0000000000000d18 t mca_vprotocol_receiver_component_finalize
>>>>>>>> 0000000000000cce t mca_vprotocol_receiver_component_init
>>>>>>>> 0000000000000cb8 t mca_vprotocol_receiver_component_open
>>>>>>>> 0000000000000c93 t mca_vprotocol_receiver_del_comm
>>>>>>>> 0000000000000a89 t mca_vprotocol_receiver_del_procs
>>>>>>>> 000000000000083c t mca_vprotocol_receiver_dump
>>>>>>>> 0000000000000d23 t mca_vprotocol_receiver_enable
>>>>>>>> 00000000000009e7 t mca_vprotocol_receiver_iprobe
>>>>>>>> 0000000000000b9a t mca_vprotocol_receiver_irecv
>>>>>>>> 0000000000000ab3 t mca_vprotocol_receiver_isend
>>>>>>>> 0000000000000a29 t mca_vprotocol_receiver_probe
>>>>>>>> 0000000000000c00 t mca_vprotocol_receiver_recv
>>>>>>>> 0000000000000b21 t mca_vprotocol_receiver_send
>>>>>>>> 00000000000009bd T mca_vprotocol_receiver_start
>>>>>>>> 0000000000000864 t mca_vprotocol_receiver_test
>>>>>>>> 0000000000000896 t mca_vprotocol_receiver_test_all
>>>>>>>> 00000000000008d0 t mca_vprotocol_receiver_test_any
>>>>>>>> 0000000000000950 t mca_vprotocol_receiver_test_some
>>>>>>>> 0000000000000916 t mca_vprotocol_receiver_wait_any
>>>>>>>> 000000000000098a t mca_vprotocol_receiver_wait_some
>>>>>>>> U ompi_request_null
>>>>>>>> U opal_output
>>>>>>>> 0000000000201440 d p.6113
>>>>>>>> [lfialho_at_aoclsb-clus openmpi]$
>>>>>>>>
>>>>>>>> On Mar 5, 2010, at 7:00 PM, Terry Dontje wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Sorry meant to add this, but you might be able to try and find the symbol causing the issue by twiddling with LD_DEBUG
>>>>>>>>>
>>>>>>>>> --td
>>>>>>>>> Terry Dontje wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Possibly there is an external symbol in the .so that is being loaded that cannot be resolved.
>>>>>>>>>> --td
>>>>>>>>>> Leonardo Fialho wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I know that libtool does not help us to find the source of this error, but, what can generate the following error?
>>>>>>>>>>>
>>>>>>>>>>> [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
>>>>>>>>>>>
>>>>>>>>>>> 1) yes, the file exists
>>>>>>>>>>> 2) yes, it has been compiled among all other components
>>>>>>>>>>> 3) yes, it is the same Open MPI version
>>>>>>>>>>> 4) this component is a copy of the pessimist component implemented by Aurelien
>>>>>>>>>>> 5) Aurelien's component presents the same error
>>>>>>>>>>>
>>>>>>>>>>> The question is: what mistake should generate an error during module loading?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>> Leonardo
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>