I remember having a conversation with someone from R at Supercomputing last year, and this was one of the issues we discussed. The problem is that you have to ensure that R is built against the OMPI you are going to use, and it is usually better to have configured OMPI --disable-dlopen --enable-static to avoid library confusion when you later run R.
I'd give that a try and see if it solves your problems. The "recipe" given by Bennet looked right to me.
On Mar 12, 2014, at 12:32 PM, Ross Boylan <ross_at_[hidden]> wrote:
> On Wed, 2014-03-12 at 11:50 +0100, Reuti wrote:
>> Am 12.03.2014 um 11:39 schrieb Jeff Squyres (jsquyres):
>>> Generally, all you need to ensure that your personal copy of OMPI is used is to set the PATH and LD_LIBRARY_PATH to point to your new Open MPI installation. I do this all the time on my development cluster (where I have something like 6 billion different installations of OMPI available... mmm... should probably clean that up...)
>>> export LD_LIBRARY_PATH=path_to_my_ompi/lib:$LD_LIBRARY_PATH
>>> export PATH=path-to-my-ompi/bin:$PATH
> I believe I've already done that. The script the launches everything is
> (all one line originally)
> R_PROFILE_USER=~/KHC/sunbelt/Rmpiprofile \
> LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH \
> PATH=/home/ross/install/bin:$PATH orterun -x R_PROFILE_USER -x
> LD_LIBRARY_PATH -x PATH -hostfile ~/KHC/sunbelt/hosts \
> -np 7 R --no-save -q
> There is a complication with R; it sticks stuff in front of
> LD_LIBRARY_PATH. However, the startup script Rmpiprofile fixes that,
> though I'm not entirely sure that is effective. However, the old
> libraries that are being loaded are not from any directories R added to
> LD_LIBRARY_PATH; instead they are from /usr/lib, which is a standard
> place for the dynamic loader to look.
>>> It should be noted that:
>>> 1. you need to *prefix* your PATH and LD_LIBRARY_PATH with these values
>>> 2. you need to set these values in a way that will be picked up on all servers that you use in your job. The safest way to do this is in your shell startup files (e.g., $HOME/.bashrc or whatever is relevant for your shell).
>> I see "libtorque" in the output below - were these jobs running inside a queuing system? The set paths might be different therein, and need to be set in the job script in this case.
> No batch system (see script above for launch mechanism). We threw a lot
> of stuff MPI configure was looking for onto the system. AFAIK torque
> isn't even installed.
> One possible issue is that the Rmpi module for R is not compiled by
> mpicc; R has its own notions of proper options for the compiler and its
> own infrastructure for building things. I did pass the location of my
> local libraries into the build process.
> This seems more like an issue with the dynamic loader, or with whatever
> system R is using when it loads Rmpi.so.
>> -- Reuti
>>> See http://www.open-mpi.org/faq/?category=running#run-prereqs, http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path, and http://www.open-mpi.org/faq/?category=running#mpirun-prefix.
>>> Note the --prefix option that is described in the 3rd FAQ item I cited -- that can be a bit easier, too.
>>> On Mar 12, 2014, at 2:51 AM, Ross Boylan <ross_at_[hidden]> wrote:
>>>> I took the advice here and built a personal copy of the current openmpi,
>>>> to see if the problems I was having with Rmpi were a result of the old
>>>> version on the system.
>>>> When I do ldd on the relevant libraries (Rmpi.so is loaded dynamically
>>>> by R) everything looks fine; path references that should be local are.
>>>> But when I run the program and do lsof it shows that both the system and
>>>> personal versions of key libraries are opened.
>>>> First, does anyone know which library will actually be used, or how to
>>>> tell which library is actually used, in this situation. I'm running on
>>>> linux (Debian squeeze)?
>>>> Second, it there some way to prevent the wrong/old/sytem libraries from
>>>> being loaded?
>>>> FWIW I'm still seeing the old misbehavior when I run this way, but, as I
>>>> said, I'm really not sure which libraries are being used. Since Rmpi
>>>> was built against the new/local ones, I think the fact that it doesn't
>>>> crash means I really am using the new ones.
>>>> Here are highlights of lsof on the process running R:
>>>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>>>> R 17634 ross cwd DIR 254,2 12288 150773764 /home/ross/KHC/sunbelt
>>>> R 17634 ross rtd DIR 8,1 4096 2 /
>>>> R 17634 ross txt REG 8,1 5648 3058294 /usr/lib/R/bin/exec/R
>>>> R 17634 ross DEL REG 8,1 2416718 /tmp/openmpi-sessions-ross_at_n100_0/60429/1/shared_mem_pool.n100
>>>> R 17634 ross mem REG 8,1 335240 3105336 /usr/lib/openmpi/lib/libopen-pal.so.0.0.0
>>>> R 17634 ross mem REG 8,1 304576 3105337 /usr/lib/openmpi/lib/libopen-rte.so.0.0.0
>>>> R 17634 ross mem REG 8,1 679992 3105332 /usr/lib/openmpi/lib/libmpi.so.0.0.2
>>>> R 17634 ross mem REG 8,1 93936 2967826 /usr/lib/libz.so.126.96.36.199
>>>> R 17634 ross mem REG 8,1 10648 3187256 /lib/libutil-2.11.3.so
>>>> R 17634 ross mem REG 8,1 32320 2359631 /usr/lib/libpciaccess.so.0.10.8
>>>> R 17634 ross mem REG 8,1 33368 2359338 /usr/lib/libnuma.so.1
>>>> R 17634 ross mem REG 254,2 979113 152045740 /home/ross/install/lib/libopen-pal.so.6.1.0
>>>> R 17634 ross mem REG 8,1 183456 2359592 /usr/lib/libtorque.so.2.0.0
>>>> R 17634 ross mem REG 254,2 1058125 152045781 /home/ross/install/lib/libopen-rte.so.7.0.0
>>>> R 17634 ross mem REG 8,1 49936 2359341 /usr/lib/libibverbs.so.1.0.0
>>>> R 17634 ross mem REG 254,2 2802579 152045867 /home/ross/install/lib/libmpi.so.1.3.0
>>>> R 17634 ross mem REG 254,2 106626 152046481 /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so
>>>> So libmpi, libopen-pal, and libopen-rte all are opened in two versions and two locations.
>>>> Ross Boylan
>>>> users mailing list
>>> Jeff Squyres
>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>> users mailing list
>> users mailing list
> users mailing list