Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] shared libraries issue compiling 1.3.1/intel10.1.022
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-10 12:24:18


See this FAQ entry:

     http://www.open-mpi.org/faq/?category=running#intel-compilers-
static

On Apr 10, 2009, at 12:16 PM, Francesco Pietra wrote:

> Hi Gus:
>
> If you feel that the observations below are not relevant to openmpi,
> please disregard the message. You have already kindly devoted so much
> time to my problems.
>
> The "limits.h" issue is solved with 10.1.022 intel compilers: as I
> felt, the problem was with the pre-10.1.021 version of the intel C++
> and ifort compilers, a subtle bug observed also by gentoo people (web
> intel). There remains an orted issue.
>
> The openmpi 1.3.1 installation was able to compile connectivity_c.c
> and hello_c.c, though, running mpirun (output below between ===):
>
> =================
> /usr/local/bin/mpirun -host -n 4 connectivity_c 2>&1 | tee
> connectivity.out
> /usr/local/bin/orted: error while loading shared libraries: libimf.so:
> cannot open shared object file: No such file or directory
> --------------------------------------------------------------------------
> A daemon (pid 8472) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed
> shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> =============
>
> At this point, Amber10 serial compiled nicely (all intel, like
> openmpi), but parallel compilation, as expected, returned the same
> problem above:
>
> =================
> export TESTsander=/usr/local/amber10/exe/sander.MPI; make
> test.sander.BASIC
> make[1]: Entering directory `/usr/local/amber10/test'
> cd cytosine && ./Run.cytosine
> orted: error while loading shared libraries: libimf.so: cannot open
> shared object file: No such file or directory
> --------------------------------------------------------------------------
> A daemon (pid 8371) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed
> shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> ./Run.cytosine: Program error
> make[1]: *** [test.sander.BASIC] Error 1
> make[1]: Leaving directory `/usr/local/amber10/test'
> make: *** [test.sander.BASIC.MPI] Error 2
> =====================
>
> Relevant info:
>
> The daemon was not ssh (thus my hypothesis that a firewall on the
> router was killing ssh is not the case). During these procedures,
> there were only deb64 and deb32 on the local network. On monoprocessor
> deb32 (i386) there is nothing of openmpi or amber. Only ssh. Thus, my
> .bashrc on deb32 can't correspond to that of deb 64 as far as
> libraries are concerned.
>
> echo $LD_LIBRARY_PATH
> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1..022/lib:/
> opt/intel/fce/10.1.022/lib:/usr/local/lib
>
> # dpkg --search libimf.so
> intel-iforte101022: /opt/intel/fce/10.1.022/lib/libimf.so
> intel-icce101022: /opt/intel/cce/10.1.022/lib/libimf.so
>
> i.e., libimf.so is on the unix path, still not found by mpirun.
>
> Before compiling I trie to carefully check all env variables and
> paths. In particular, as to mpi:
>
> mpif90 -show /opt/intel/fce/10.1.022//bin/ifort -I/usr/local/include
> -pthread -I/usr/local/lib -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi
> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil
>
> thanks
> francesco
>
>
>
> On Thu, Apr 9, 2009 at 9:29 PM, Gus Correa <gus_at_[hidden]>
> wrote:
> > Hi Francesco
> >
> > Francesco Pietra wrote:
> >>
> >> Hi:
> >> As failure to find "limits.h" in my attempted compilations of
> Amber of
> >> th fast few days (amd64 lenny, openmpi 1.3.1, intel compilers
> >> 10.1.015) is probably (or I hope so) a bug of the version used of
> >> intel compilers (I made with debian the same observations reported
> >> for gentoo,
> >> http://software.intel.com/en-us/forums/intel-c-compiler/topic/59886/)
> .
> >>
> >> I made a deb package of 10.1.022, icc and ifort.
> >>
> >> ./configure CC icc, CXX icp,
> >
> > The Intel C++ compiler is called icpc, not icp.
> > Is this a typo on your message, or on the actual configure options?
> >
> > F77 and FC ifort --with-libnuma=/usr (not
> >>
> >> /usr/lib so that the numa.h issue is not raised), "make clean",
> >
> > If you really did "make clean" you may have removed useful things.
> > What did you do, "make" or "make clean"?
> >
> > and
> >>
> >> "mak install" gave no error signals. However, the compilation
> tests in
> >> the examples did not pass and I really don't understand why.
> >>
> >
> > Which compilation tests you are talking about?
> > From Amber or from the OpenMPI example programs (connectivity_c and
> > hello_c), or both?
> >
> >> The error, with both connectivity_c and hello_c (I was operating on
> >> the parallel computer deb64 directly and have access to everything
> >> there) was failure to find a shared library, libimf.so
> >>
> >
> > To get the right Intel environment,
> > you need to put these commands inside your login files
> > (.bashrc or .cshrc), to setup the Intel environment variables
> correctly:
> >
> > source /path/to/your/intel/cce/bin/iccvars.sh
> > source /path/to/your/intel/cce/bin/ifortvars.sh
> >
> > and perhaps a similar one for mkl.
> > (I don't use MKL, I don't know much about it).
> >
> > If your home directory is NFS mounted to all the computers you
> > use to run parallel programs,
> > then the same .bashrc/.csrhc will work on all computers.
> > However, if you have a separate home directory on each computer,
> > then you need to do this on each of them.
> > I.e., you have to include the "source" commands above
> > in the .bashrc/.cshrc files on your home directory in EACH computer.
> >
> > Also I presume you use bash/sh not tcsh/csh, right?
> > Otherwise you need to source iccvars.csh instead of iccvars.sh.
> >
> >
> >> # dpkg --search libimf.so
> >> /opt/intel/fce/10.1.022/lib/libimf.so (the same for cce)
> >>
> >> which path seems to be correctly sourced by iccvars.sh and
> >> ifortvars.sh (incidentally, both files are -rw-r--r-- root root;
> >> correct that non executable?)
> >>
> >
> > The permissions here are not a problem.
> > You are supposed to *source* the files, not to execute them.
> > If you execute them instead of sourcing the files,
> > your Intel environment will *NOT* be setup.
> >
> > BTW, the easy way to check your environment is to type "env" on the
> > shell command prompt.
> >
> >> echo $LD_LIBRARY_PATH
> >> returned, inter alia,
> >>
> >> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/mkl/10.1.2.024/lib/
> em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib
> >> (why twice the mkl?)
> >>
> >
> > Hard to tell in which computer you were when you did this,
> > and hence what it should affect.
> >
> > You man have sourced twice the mkl shell that sets up the MKL
> environment
> > variables, which would write its library path more than
> > once.
> >
> > When the environment variables get this much confused,
> > with duplicate paths and so on, you may want to logout
> > and login again, to start fresh.
> >
> > Do you need MKL for Amber?
> > If you don't use it, keep things simple, and don't bother about it.
> >
> >
> >> I surely miss to understand something fundamental. Hope other
> eyes see
> >> better
> >>
> >
> > Jody helped you run the hello_c program successfully.
> > Try to repeat carefully the same steps.
> > You should get the same result,
> > the OpenMPI test programs should run.
> >
> >> A kind person elsewhere suggested me on passing "The use of -rpath
> >> during linking is highly recommended as opposed to setting
> >> LD_LIBRARY_PATH at run time, not the least because it hardcodes the
> >> paths to the "right" library files in the executables themselves"
> >> Should this be relevant to the present issue, where to learn about
> >> -rpath linking?
> >>
> >
> > If you are talking about Amber,
> > you would have to tweak the Makefiles to tweak the linker -rpath.
> > And we don't know much about Amber's Makefiles,
> > so this may be a very tricky approach.
> >
> > If you are talking about the OpenMPI test programs,
> > I think it is just a matter of setting the Intel environment
> variables
> > right, sourcing the ifortvars.sh iccvars.sh properly,
> > to get the right runtime LD_LIBRARY_PATH.
> >
> >> thanks
> >> francesco pietra
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > I hope this helps.
> > Gus Correa
> >
> >
> ---------------------------------------------------------------------
> > Gustavo Correa
> > Lamont-Doherty Earth Observatory - Columbia University
> > Palisades, NY, 10964-8000 - USA
> >
> ---------------------------------------------------------------------
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems