Brian,

I think I have figured this one out.  By default ctypes calls dlopen with mode = RTLD_LOCAL (except on Mac OS 10.3).  When I instruct ctypes to set mode = RTLD_GLOBAL it works fine on 10.4.  Based on the dlopen man page:

     RTLD_GLOBAL   Symbols exported from this image (dynamic library or bun-
                   dle) will be available to any images build with
                   -flat_namespace option to ld(1) or to calls to dlsym() when
                   using a special handle.

     RTLD_LOCAL    Symbols exported from this image (dynamic library or bun-
                   dle) are generally hidden and only availble to dlsym() when
                   directly using the handle returned by this call to
                   dlopen().  If neither RTLD_GLOBAL nor RTLD_LOCAL is speci-
                   fied, the default is RTLD_GLOBAL

This behavior makes sense.  Thus the following works on 10.4:

from ctypes import *
mpi = CDLL('libmpi.0.dylib', RTLD_GLOBAL)
f = pythonapi.Py_GetArgcArgv
argc = c_int()
argv = POINTER(c_char_p)()
f(byref(argc), byref(argv))
mpi.MPI_Init(byref(argc), byref(argv))
mpi.MPI_Finalize()

So I am not sure this is a defect in OpenMPI, but it sure is a subtle aspect of using it.  I will probably document this somewhere in the package I am creating.  

Thanks

Brian







On Sep 6, 2006, at 9:00 AM, Brian Barrett wrote:

Thanks for the information.  I've filed a bug in our bug tracker on this
issue.  It appears that for some reason, when libmpi is dlopened() by
python, that objects it then dlopens are not able to find symbols in the
libmpi.  It will probably take me a bit of time to track this issue
down, but you will be notified by the bug tracker when the issue is
resolved.

Brian


On Thu, 2006-08-31 at 17:27 -0600, Brian E Granger wrote:
Brian,


Sure, but my example will probably seem a little odd.  I am calling
the mpi shared library from Python using ctypes.  


The dependencies for doing things this way are:


1. Python built with --enable-shared
2. The ctypes python package
3. OpenMPI configured with --enable-shared


Once you have this, the following python script will cause the problem
on Mac OS X:


from ctypes import *


f = pythonapi.Py_GetArgcArgv
argc = c_int()
argv = POINTER(c_char_p)()
f(byref(argc), byref(argv))
mpi = cdll.LoadLibrary('libmpi.0.dylib')
mpi.MPI_Init(byref(argc), byref(argv))


I will try this on Linux as well to see if I get the same error.  One
important piece of the puzzle is that if I configure openmpi with the
--disable-dlopen flag, I don't have the problem.  I will do some
further testing on different systems and get back to you.  


Thanks for looking at this.


Brian



On Aug 31, 2006, at 4:20 PM, Brian Barrett wrote:

This is quite strange, and we're having some trouble figuring out
exactly why the opening is failing.  Do you have a (somewhat?) easy
list
of instructions so that I can try to reproduce this?


Thanks,


Brian


On Tue, 2006-08-22 at 20:58 -0600, Brian Granger wrote:
HI,


I am trying to dynamically load mpi.dylib on Mac OS X (using
ctypes in 
python).  It seems to
load fine, but when I call MPI_Init(), I get the error shown
below.  I
can call other functions just fine (like MPI_Initialized).


Also, my mpi install is seeing all the needed components and I can
load them myself without error using dlopen.  I can also compile
and
run mpi programs and I build openmpi with shared library support.


[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_allocator_basic.so,
9):
Symbol not found: _ompi_free_list_item_t_class
  Referenced from: 
/usr/local/openmpi-1.1/lib/openmpi/mca_allocator_basic.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_rcache_rb.so, 9):
Symbol
not found: _ompi_free_list_item_t_class
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_rcache_rb.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_mpool_sm.so, 9):
Symbol
not found: _mca_allocator_base_components
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_mpool_sm.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_pml_ob1.so, 9):
Symbol
not found: _ompi_free_list_item_t_class
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_pml_ob1.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_coll_basic.so, 9):
Symbol not found: _mca_pml
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_coll_basic.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_coll_hierarch.so,
9):
Symbol not found: _ompi_mpi_op_max
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_coll_hierarch.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_coll_sm.so, 9):
Symbol
not found: _ompi_mpi_local_convertor
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_coll_sm.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_coll_tuned.so, 9):
Symbol not found: _mca_pml
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_coll_tuned.so
  Expected in: flat namespace
  (ignored)
[localhost:00973] mca: base: component_find: unable to open:
dlopen(/usr/local/openmpi-1.1/lib/openmpi/mca_osc_pt2pt.so, 9):
Symbol
not found: _ompi_request_t_class
  Referenced
from: /usr/local/openmpi-1.1/lib/openmpi/mca_osc_pt2pt.so
  Expected in: flat namespace
  (ignored)
--------------------------------------------------------------------------
No available pml components were found!


This means that there are no components of this type installed on
your
system or all the components reported that they could not be used.


This is a fatal error; your MPI process is likely to abort.  Check
the
output of the "ompi_info" command and ensure that components of
this
type are available on your system.  You may also wish to check the
value of the "component_path" MCA parameter and ensure that it has
at
least one directory that contains valid MCA components.


--------------------------------------------------------------------------
[localhost:00973] PML ob1 cannot be selected


Any Ideas?


Thanks


Brian Granger
_______________________________________________
users mailing list


_______________________________________________
users mailing list

Brian E Granger, Ph.D.
Research Scientist
Tech-X Corporation
phone:  720-974-1850
bgranger@txcorp.com






_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Brian E Granger, Ph.D.
Research Scientist
Tech-X Corporation
phone:  720-974-1850
bgranger@txcorp.com