On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
> | Does this happen for all MPI programs (potentially only those that
> | use the MPI-2 one-sided stuff), or just your R environment?
> This is the likely winner.
> It seems indeed due to R's Rmpi package. Running a simple mpitest.c
> shows no
> error message. We will look at the Rmpi initialization to see what
> cause this.
Does rmpi link in libmpi.so or dynamically load it at run-time? The
pt2pt one-sided component uses the MPI-1 point-to-point calls for
communication (hence, the pt2pt name). If those symbols were
unavailable (say, because libmpi.so was dynamically loaded) I could
see how this would cause problems.
The pt2pt component (rightly) does not have a -lmpi in its link
line. The other components that use symbols in libmpi.so (wrongly)
do have a -lmpi in their link line. This can cause some problems on
some platforms (Linux tends to do dynamic linking / dynamic loading
better than most). That's why only the pt2pt component fails.
My guess is that Rmpi is dynamically loading libmpi.so, but not
specifying the RTLD_GLOBAL flag. This means that libmpi.so is not
available to the components the way it should be, and all goes
downhill from there. It only mostly works because we do something
silly with how we link most of our components, and Linux is just
smart enough to cover our rears (thankfully).
- Someone could make the pt2pt osc component link in libmpi.so
like the rest of the components and hope that no one ever
tries this on a non-friendly platform.
- Debian (and all Rmpi users) could configure Open MPI with the
--disable-dlopen flag and ignore the problem.
- Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
flag and fix the problem properly.
I think it's clear I'm in favor of Option 3.