Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Granger (ellisonbg.net_at_[hidden])
Date: 2007-10-12 15:44:56


> That's the same for R. We don;t touch the innert guts of module loading for
> this . What Hao realized after looking at the corresponding FAQ item was that
> right before calling MPI_Init, one can load libmpi explicitly, and -- and
> that;s the important bit -- set the proper RTLD_GLOBAL argument.
>
> So you could adapt the patch we used :
>
> a) add an include for dlfcn.h
>
> b) explicitly call dlopen on libmpi.so with RTLD_GLOBAL
>
> That should be reasonably easy to test as you only need to rebuild mpi4py,

I don't like this solution one bit. Here is why. When someone needs
to use a shared library in a given piece of code there are 2 options:

1. Link in the shared library at compile time.

2. Load it using dlopen.

What you are telling me is that to use libmpi, I need to do both of
these!! Am I not correct that this is an abuse of dlopen?

Anyone should be able to link to libmpi at compile time and things
shoud "just work" - rergardless of how my binary file is being used
(my binary file could be linked in at compile time or itself loaded
using dlopen).

While I agree that the hack would probably solve the problem for
mpi4py, I don't think this is a true solution to the problem.

Brian

>
> --- rmpi-0.5-4.orig/src/Rmpi.c
> +++ rmpi-0.5-4/src/Rmpi.c
> @@ -16,6 +16,7 @@
> */
>
> #include "Rmpi.h"
> +#include <dlfcn.h>
>
> static MPI_Comm *comm;
> static MPI_Status *status;
> @@ -32,7 +33,9 @@
> if (flag)
> return AsInt(1);
> else {
> - MPI_Init((void *)0,(void *)0);
> + char *libm="libmpi.so";
> + dlopen(libm,RTLD_GLOBAL);
> + MPI_Init((void *)0,(void *)0);
> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
> MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
> comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm);
>
>
> | is responsible for loading _everything_ into Python, and I am pretty
> | sure that there is no way that people would be willing to change it.
> | I am cc'ing this to Lisandro - maybe he has some ideas on this front.
>
> Actually, looked like you didn't CC him.
>
> Hth, Dirk
>
> |
> | Thanks
> |
> | Brian
> |
> | On 10/10/07, Brian Barrett <brbarret_at_[hidden]> wrote:
> | > On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
> | > > | Does this happen for all MPI programs (potentially only those that
> | > > | use the MPI-2 one-sided stuff), or just your R environment?
> | > >
> | > > This is the likely winner.
> | > >
> | > > It seems indeed due to R's Rmpi package. Running a simple mpitest.c
> | > > shows no
> | > > error message. We will look at the Rmpi initialization to see what
> | > > could
> | > > cause this.
> | >
> | > Does rmpi link in libmpi.so or dynamically load it at run-time? The
> | > pt2pt one-sided component uses the MPI-1 point-to-point calls for
> | > communication (hence, the pt2pt name). If those symbols were
> | > unavailable (say, because libmpi.so was dynamically loaded) I could
> | > see how this would cause problems.
> | >
> | > The pt2pt component (rightly) does not have a -lmpi in its link
> | > line. The other components that use symbols in libmpi.so (wrongly)
> | > do have a -lmpi in their link line. This can cause some problems on
> | > some platforms (Linux tends to do dynamic linking / dynamic loading
> | > better than most). That's why only the pt2pt component fails.
> | >
> | > My guess is that Rmpi is dynamically loading libmpi.so, but not
> | > specifying the RTLD_GLOBAL flag. This means that libmpi.so is not
> | > available to the components the way it should be, and all goes
> | > downhill from there. It only mostly works because we do something
> | > silly with how we link most of our components, and Linux is just
> | > smart enough to cover our rears (thankfully).
> | >
> | > Solutions:
> | >
> | > - Someone could make the pt2pt osc component link in libmpi.so
> | > like the rest of the components and hope that no one ever
> | > tries this on a non-friendly platform.
> | > - Debian (and all Rmpi users) could configure Open MPI with the
> | > --disable-dlopen flag and ignore the problem.
> | > - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
> | > flag and fix the problem properly.
> | >
> | > I think it's clear I'm in favor of Option 3.
> | >
> | > Brian
> | > _______________________________________________
> | > users mailing list
> | > users_at_[hidden]
> | > http://www.open-mpi.org/mailman/listinfo.cgi/users
> | >
> | _______________________________________________
> | users mailing list
> | users_at_[hidden]
> | http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Three out of two people have difficulties with fractions.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>