Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi.isend still not working (was trying to use personal copy of 1.7.4--solved)
From: Ross Boylan (ross_at_[hidden])
Date: 2014-03-13 16:13:28


I changed the calls to dlopen in Rmpi.c so that it tried libmpi.so
before libmpi.so.0. I also rebuilt MPI, R, and Rmpi as suggested
earlier by Bennet Fauber
(http://www.open-mpi.org/community/lists/users/2014/03/23823.php).
Thanks Bennet!

My theory is that the change to dlopen by itself was sufficient. The
rebuilding done before (by others) may have worked because they made the
load of libmpi.so.0 fail. That's not a great theory since a) if there
was no libmpi.so.0 on the system it would fail anyway and b) dlopen
could probably find libmpi.so.0 in standard system locations regardless
of how R was built or LD_LIBRARY_PATHS setup (assuming it didn't find it
in a custom place first).

Which brings me back to my original problem: mpi.isend.Robj (or possibly
mpi.recv.Robj on the other end) did not seem to be working properly. I
had hoped switching to a newer MPI library (1.7.4) would fix this; if
anything, it made it worse. I am sending to a fake receiver (at rank 1)
that does nothing but print a message when it gets a message. r is a
list with
> length(serialize(r, NULL)) # the mpi.isend.Robj R function serializes
the object and then mpi.isend's it.
length(serialize(r, NULL))
[1] 599499 # ~ 0.5 MB
> mpi.send.Robj(1, 1, 4) # send of number works
Fake Assembler: 0 4 numeric
> mpi.send.Robj(r, 1, 4) # send of r works
NULL
> Fake Assembler: 0 4 list
mpi.isend.Robj(1, 1, 4) # isend of number works
> Fake Assembler: 0 4 numeric
mpi.isend.Robj(r, 1, 4) # sometimes this used to work the first time
> mpi.send.Robj(r, 1, 4) # sometimes used to get previous message
unstuck
# never get the command prompt back
# presumably mpi.send, the C function, does not return.

I might just switch to mpi.send, though the fact that something is going
wrong makes me nervous.

Obviously given the involvement of R it's not clear the problem lies
with the MPI layer, but that seems at least a possibility.

Ross
On Thu, 2014-03-13 at 12:15 -0700, Ross Boylan wrote:
> On Wed, 2014-03-12 at 10:52 -0400, Bennet Fauber wrote:
> > My experience with Rmpi and OpenMPI is that it doesn't seem to do well
> > with the dlopen or dynamic loading. I recently installed R 3.0.3, and
> > Rmpi, which failed when built against our standard OpenMPI but
> > succeeded using the following 'secret recipe'. Perhaps there is
> > something here that will be helpful for you.
> >
> I have a couple of things to report. First,
> http://www.stats.uwo.ca/faculty/yu/Rmpi/changelogs.htm says
> It looks like that the option --disable-dlopen is not necessary to
> install Open MPI 1.6, at least on Debian. This might be R's .onLoad
> correctly loading dynamic libraries and Open MPI is not required to be
> compiled with static libraries enabled.
>
> Second, I tried rebuilding MPI with --disable-dlopen WITHOUT any of the
> changes to R or Rmpi. The behavior didn't change. Nobody said it
> would, but I thought it was worth a try.
>
> Third, the source of the double-load of mpi-related libraries looks like
> this code in Rmpi.c:
> if (!dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY)
> && !dlopen("libmpi.so", RTLD_GLOBAL | RTLD_LAZY)){
> So libmpi.so.1 is loaded because it's linked to Rmpi.so, and libmpi.so.0
> is loaded because the code does so explicitly.
>
> The motivation was
> http://www.stats.uwo.ca/faculty/yu/Rmpi/changelogs.htm notes
> ----------------------------------
> 2007-10-24, version 0.5-5:
>
> dlopen has been used to load libmpi.so explicitly. This is mainly useful
> for Rmpi under OpenMPI where one might see many error messages:
> mca: base: component_find: unable to open osc pt2pt: file not found
> (ignored)
> if libmpi.so is not loaded with RTLD_GLOBAL flag.
> -------------------------------------
>
> I think I'll try changing to to try libmpi.so first so that it picks up
> libmpi.so.1 if available. I've already rebuilt R, though it looks as if
> Rmpi may have been the source of the problems.
>
> Ross
> > ### Install openmpi 1.6.5
> >
> > export PREFIX=/scratch/support_flux/
> > bennet/local
> > COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
> > CONFIGURE_FLAGS='--disable-dlopen --enable-static'
> > cd openmpi-1.6.5
> > ./configure --prefix=${PREFIX} \
> > --mandir=${PREFIX}/man \
> > --with-tm=/usr/local/torque \
> > --with-openib --with-psm \
> > --with-io-romio-flags='--with-file-system=testfs+ufs+nfs+lustre' \
> > $CONFIGURE_FLAGS \
> > $COMPILERS
> > make
> > make check
> > make install
> >
> > ### Install R 3.0.3
> >
> > wget http://cran.case.edu/src/base/R-3/R-3.0.3.tar.gz
> > tar xzvf R-3.0.3.tar.gz
> > cd R-3.0.3
> >
> > export MPI_HOME=/scratch/support_
> > flux/bennet/local
> > export LD_LIBRARY_PATH=$MPI_HOME/lib:${LD_LIBRARY_PATH}
> > export LD_LIBRARY_PATH=$MPI_HOME/openmpi:${LD_LIBRARY_PATH}
> > export PATH=${PATH}:${MPI_HOME}/bin
> > export LDFLAGS='-Wl,-O1'
> > export R_PAPERSIZE=letter
> > export R_INST=${PREFIX}
> > export FFLAGS='-O3 -mtune=native'
> > export CFLAGS='-O3 -mtune=native'
> > ./configure --prefix=${R_INST} --mandir=${R_INST}/man
> > --enable-R-shlib --without-x
> > make
> > make check
> > make install
> > wget http://www.stats.uwo.ca/faculty/yu/Rmpi/download/linux/Rmpi_0.6-3.tar.gz
> > R CMD INSTALL Rmpi_0.6-3.tar.gz \
> > --configure-args="--with-Rmpi-include=$MPI_HOME/include
> > --with-Rmpi-libpath=$MPI_HOME/lib --with-Rmpi-type=OPENMPI"
> >
> > Make sure environment variables and paths are set
> >
> > MPI_HOME=/home/software/rhel6/openmpi-1.6.5/gcc-4.4.7-static
> > PATH=/home/software/rhel6/openmpi-1.6.5/gcc-4.4.7-static/bin
> > LD_LIBRARY_PATH=$LD_LIBRARY_PATH}:/home/software/rhel6/openmpi-1.6.5/gcc-4.4.7-static/lib
> > LD_LIBRARY_PATH=$LD_LIBRARY_PATH}:/home/software/rhel6/openmpi-1.6.5/gcc-4.4.7-static/lib/openmpi
> > PATH=/home/software/rhel6/R/3.0.3/bin:$LD_LIBRARY_PATH}
> > LD_LIBRARY_PATH=/home/software/rhel6/R/3.0.3/lib64/R/lib:$LD_LIBRARY_PATH}
> >
> > ## Then install snow with
> > R
> > > install.packages('snow')
> > [ . . . .
> >
> >
> > I think the key thing is the --disable-dlopen, though it might require
> > both. Jeff Squyres had a post about this quite a while ago that gives
> > more detail about what's happening:
> >
> > http://www.open-mpi.org/community/lists/devel/2012/04/10840.php
> >
> > -- bennet
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>