Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Missing Symbol
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-03-05 18:02:09


Ick.

I wondered aloud on IM to Terry after your earlier emails if we should just custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is effectively reporting the "wrong" error back to OMPI, so the error string that we get to print out ends up not being very useful (e.g., not showing which symbol was missing, or what the problem was with the dlopen). Fixing this properly in libltdl is actually somewhat tricky -- which is why it hasn't been fixed yet. But given that OMPI's use of libltdl is pretty specific, we might be able to get away with a simple fix that works just for OMPI (but wouldn't necessarily be suitable for all other libltdl users).

Hmmm...

This looks do-able. I'll commit in a bit.

On Mar 5, 2010, at 1:27 PM, Leonardo Fialho wrote:

> I see... but it is really strange because this module is clean, it does not use nothing. This is the output of the nm command, I can't see any symbol which is not available.
>
> [lfialho_at_aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so
> 0000000000201208 a _DYNAMIC
> 0000000000201408 a _GLOBAL_OFFSET_TABLE_
> w _Jv_RegisterClasses
> 00000000002011e0 d __CTOR_END__
> 00000000002011d8 d __CTOR_LIST__
> 00000000002011f0 d __DTOR_END__
> 00000000002011e8 d __DTOR_LIST__
> 00000000000011d0 r __FRAME_END__
> 00000000002011f8 d __JCR_END__
> 00000000002011f8 d __JCR_LIST__
> 0000000000201640 A __bss_start
> w __cxa_finalize@@GLIBC_2.2.5
> 0000000000000d40 t __do_global_ctors_aux
> 00000000000007c0 t __do_global_dtors_aux
> 0000000000201200 d __dso_handle
> w __gmon_start__
> 0000000000201640 A _edata
> 0000000000201648 A _end
> 0000000000000d78 T _fini
> 0000000000000750 T _init
> 00000000000007a0 t call_gmon_start
> 0000000000201640 b completed.6115
> 0000000000000810 t frame_dummy
> U mca_pml_v
> 0000000000201460 D mca_vprotocol_receiver
> 0000000000000c71 t mca_vprotocol_receiver_add_comm
> 0000000000000a5f t mca_vprotocol_receiver_add_procs
> 0000000000201540 D mca_vprotocol_receiver_component
> 0000000000000cc3 t mca_vprotocol_receiver_component_close
> 0000000000000d18 t mca_vprotocol_receiver_component_finalize
> 0000000000000cce t mca_vprotocol_receiver_component_init
> 0000000000000cb8 t mca_vprotocol_receiver_component_open
> 0000000000000c93 t mca_vprotocol_receiver_del_comm
> 0000000000000a89 t mca_vprotocol_receiver_del_procs
> 000000000000083c t mca_vprotocol_receiver_dump
> 0000000000000d23 t mca_vprotocol_receiver_enable
> 00000000000009e7 t mca_vprotocol_receiver_iprobe
> 0000000000000b9a t mca_vprotocol_receiver_irecv
> 0000000000000ab3 t mca_vprotocol_receiver_isend
> 0000000000000a29 t mca_vprotocol_receiver_probe
> 0000000000000c00 t mca_vprotocol_receiver_recv
> 0000000000000b21 t mca_vprotocol_receiver_send
> 00000000000009bd T mca_vprotocol_receiver_start
> 0000000000000864 t mca_vprotocol_receiver_test
> 0000000000000896 t mca_vprotocol_receiver_test_all
> 00000000000008d0 t mca_vprotocol_receiver_test_any
> 0000000000000950 t mca_vprotocol_receiver_test_some
> 0000000000000916 t mca_vprotocol_receiver_wait_any
> 000000000000098a t mca_vprotocol_receiver_wait_some
> U ompi_request_null
> U opal_output
> 0000000000201440 d p.6113
> [lfialho_at_aoclsb-clus openmpi]$
>
> On Mar 5, 2010, at 7:00 PM, Terry Dontje wrote:
>
> > Sorry meant to add this, but you might be able to try and find the symbol causing the issue by twiddling with LD_DEBUG
> >
> > --td
> > Terry Dontje wrote:
> >> Possibly there is an external symbol in the .so that is being loaded that cannot be resolved.
> >> --td
> >> Leonardo Fialho wrote:
> >>> Hi,
> >>>
> >>> I know that libtool does not help us to find the source of this error, but, what can generate the following error?
> >>>
> >>> [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
> >>>
> >>> 1) yes, the file exists
> >>> 2) yes, it has been compiled among all other components
> >>> 3) yes, it is the same Open MPI version
> >>> 4) this component is a copy of the pessimist component implemented by Aurelien
> >>> 5) Aurelien's component presents the same error
> >>>
> >>> The question is: what mistake should generate an error during module loading?
> >>>
> >>> Thanks in advance,
> >>> Leonardo
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/