Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Dirk Eddelbuettel (edd_at_[hidden])
Date: 2007-10-10 19:40:39


On 10 October 2007 at 15:27, Brian Granger wrote:
| I am seeing the same error, but I am using mpi4py (Lisandro Dalcin's
| Python MPI bindings). I don't think that libmpi.so is being dlopen'd
| directly at runtime, but, the shared library that is linked at compile
| time to libmpi.so is probably being loaded at runtime. The odd thing
| is that mpi4py has been tested extensively with openmpi and this is
| the first version of openmpi that we have seen this issue. I tried
| 1.2.3 again yesterday and it works fine. What changed with 1.2.4?
|
| The problem with our case is that the code that is doing the dlopen is
| deep inside Python itself (not just mpi4py). It is the same code that

That's the same for R. We don;t touch the innert guts of module loading for
this . What Hao realized after looking at the corresponding FAQ item was that
right before calling MPI_Init, one can load libmpi explicitly, and -- and
that;s the important bit -- set the proper RTLD_GLOBAL argument.

So you could adapt the patch we used :

   a) add an include for dlfcn.h

   b) explicitly call dlopen on libmpi.so with RTLD_GLOBAL

That should be reasonably easy to test as you only need to rebuild mpi4py,

--- rmpi-0.5-4.orig/src/Rmpi.c
+++ rmpi-0.5-4/src/Rmpi.c
@@ -16,6 +16,7 @@
  */
 
 #include "Rmpi.h"
+#include <dlfcn.h>
 
 static MPI_Comm *comm;
 static MPI_Status *status;
@@ -32,7 +33,9 @@
 if (flag)
                 return AsInt(1);
         else {
- MPI_Init((void *)0,(void *)0);
+ char *libm="libmpi.so";
+ dlopen(libm,RTLD_GLOBAL);
+ MPI_Init((void *)0,(void *)0);
                 MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
                 MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
                 comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm);

| is responsible for loading _everything_ into Python, and I am pretty
| sure that there is no way that people would be willing to change it.
| I am cc'ing this to Lisandro - maybe he has some ideas on this front.

Actually, looked like you didn't CC him.

Hth, Dirk

|
| Thanks
|
| Brian
|
| On 10/10/07, Brian Barrett <brbarret_at_[hidden]> wrote:
| > On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
| > > | Does this happen for all MPI programs (potentially only those that
| > > | use the MPI-2 one-sided stuff), or just your R environment?
| > >
| > > This is the likely winner.
| > >
| > > It seems indeed due to R's Rmpi package. Running a simple mpitest.c
| > > shows no
| > > error message. We will look at the Rmpi initialization to see what
| > > could
| > > cause this.
| >
| > Does rmpi link in libmpi.so or dynamically load it at run-time? The
| > pt2pt one-sided component uses the MPI-1 point-to-point calls for
| > communication (hence, the pt2pt name). If those symbols were
| > unavailable (say, because libmpi.so was dynamically loaded) I could
| > see how this would cause problems.
| >
| > The pt2pt component (rightly) does not have a -lmpi in its link
| > line. The other components that use symbols in libmpi.so (wrongly)
| > do have a -lmpi in their link line. This can cause some problems on
| > some platforms (Linux tends to do dynamic linking / dynamic loading
| > better than most). That's why only the pt2pt component fails.
| >
| > My guess is that Rmpi is dynamically loading libmpi.so, but not
| > specifying the RTLD_GLOBAL flag. This means that libmpi.so is not
| > available to the components the way it should be, and all goes
| > downhill from there. It only mostly works because we do something
| > silly with how we link most of our components, and Linux is just
| > smart enough to cover our rears (thankfully).
| >
| > Solutions:
| >
| > - Someone could make the pt2pt osc component link in libmpi.so
| > like the rest of the components and hope that no one ever
| > tries this on a non-friendly platform.
| > - Debian (and all Rmpi users) could configure Open MPI with the
| > --disable-dlopen flag and ignore the problem.
| > - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
| > flag and fix the problem properly.
| >
| > I think it's clear I'm in favor of Option 3.
| >
| > Brian
| > _______________________________________________
| > users mailing list
| > users_at_[hidden]
| > http://www.open-mpi.org/mailman/listinfo.cgi/users
| >
| _______________________________________________
| users mailing list
| users_at_[hidden]
| http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Three out of two people have difficulties with fractions.