Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022
From: Mostyn Lewis (Mostyn.Lewis_at_[hidden])
Date: 2009-04-10 12:27:56


If you want to find libimf.so, which is a shared INTEL library,
pass the library path with a -x on mpirun

mpirun .... -x LD_LIBRARY_PATH ....

DM

On Fri, 10 Apr 2009, Francesco Pietra wrote:

> Hi Gus:
>
> If you feel that the observations below are not relevant to openmpi,
> please disregard the message. You have already kindly devoted so much
> time to my problems.
>
> The "limits.h" issue is solved with 10.1.022 intel compilers: as I
> felt, the problem was with the pre-10.1.021 version of the intel C++
> and ifort compilers, a subtle bug observed also by gentoo people (web
> intel). There remains an orted issue.
>
> The openmpi 1.3.1 installation was able to compile connectivity_c.c
> and hello_c.c, though, running mpirun (output below between ===):
>
> =================
> /usr/local/bin/mpirun -host -n 4 connectivity_c 2>&1 | tee connectivity.out
> /usr/local/bin/orted: error while loading shared libraries: libimf.so:
> cannot open shared object file: No such file or directory
> --------------------------------------------------------------------------
> A daemon (pid 8472) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> =============
>
> At this point, Amber10 serial compiled nicely (all intel, like
> openmpi), but parallel compilation, as expected, returned the same
> problem above:
>
> =================
> export TESTsander=/usr/local/amber10/exe/sander.MPI; make test.sander.BASIC
> make[1]: Entering directory `/usr/local/amber10/test'
> cd cytosine && ./Run.cytosine
> orted: error while loading shared libraries: libimf.so: cannot open
> shared object file: No such file or directory
> --------------------------------------------------------------------------
> A daemon (pid 8371) died unexpectedly with status 127 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
> ./Run.cytosine: Program error
> make[1]: *** [test.sander.BASIC] Error 1
> make[1]: Leaving directory `/usr/local/amber10/test'
> make: *** [test.sander.BASIC.MPI] Error 2
> =====================
>
> Relevant info:
>
> The daemon was not ssh (thus my hypothesis that a firewall on the
> router was killing ssh is not the case). During these procedures,
> there were only deb64 and deb32 on the local network. On monoprocessor
> deb32 (i386) there is nothing of openmpi or amber. Only ssh. Thus, my
> .bashrc on deb32 can't correspond to that of deb 64 as far as
> libraries are concerned.
>
> echo $LD_LIBRARY_PATH
> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1..022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib
>
> # dpkg --search libimf.so
> intel-iforte101022: /opt/intel/fce/10.1.022/lib/libimf.so
> intel-icce101022: /opt/intel/cce/10.1.022/lib/libimf.so
>
> i.e., libimf.so is on the unix path, still not found by mpirun.
>
> Before compiling I trie to carefully check all env variables and
> paths. In particular, as to mpi:
>
> mpif90 -show /opt/intel/fce/10.1.022//bin/ifort -I/usr/local/include
> -pthread -I/usr/local/lib -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi
> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil
>
> thanks
> francesco
>
>
>
> On Thu, Apr 9, 2009 at 9:29 PM, Gus Correa <gus_at_[hidden]> wrote:
>> Hi Francesco
>>
>> Francesco Pietra wrote:
>>>
>>> Hi:
>>> As failure to find "limits.h" in my attempted compilations of Amber of
>>> th fast few days (amd64 lenny, openmpi 1.3.1, intel compilers
>>> 10.1.015) is probably (or I hope so) a bug of the version used of
>>> intel compilers (I made with debian the same observations reported
>>> for gentoo,
>>> http://software.intel.com/en-us/forums/intel-c-compiler/topic/59886/).
>>>
>>> I made a deb package of 10.1.022, icc and ifort.
>>>
>>> ./configure CC icc, CXX icp,
>>
>> The Intel C++ compiler is called icpc, not icp.
>> Is this a typo on your message, or on the actual configure options?
>>
>> F77 and FC ifort --with-libnuma=/usr (not
>>>
>>> /usr/lib so that the numa.h issue is not raised), "make clean",
>>
>> If you really did "make clean" you may have removed useful things.
>> What did you do, "make" or "make clean"?
>>
>> and
>>>
>>> "mak install" gave no error signals. However, the compilation tests in
>>> the examples did not pass and I really don't understand why.
>>>
>>
>> Which compilation tests you are talking about?
>> From Amber or from the OpenMPI example programs (connectivity_c and
>> hello_c), or both?
>>
>>> The error, with both connectivity_c and hello_c (I was operating on
>>> the parallel computer deb64 directly and have access to everything
>>> there) was failure to find a shared library, libimf.so
>>>
>>
>> To get the right Intel environment,
>> you need to put these commands inside your login files
>> (.bashrc or .cshrc), to setup the Intel environment variables correctly:
>>
>> source /path/to/your/intel/cce/bin/iccvars.sh
>> source /path/to/your/intel/cce/bin/ifortvars.sh
>>
>> and perhaps a similar one for mkl.
>> (I don't use MKL, I don't know much about it).
>>
>> If your home directory is NFS mounted to all the computers you
>> use to run parallel programs,
>> then the same .bashrc/.csrhc will work on all computers.
>> However, if you have a separate home directory on each computer,
>> then you need to do this on each of them.
>> I.e., you have to include the "source" commands above
>> in the .bashrc/.cshrc files on your home directory in EACH computer.
>>
>> Also I presume you use bash/sh not tcsh/csh, right?
>> Otherwise you need to source iccvars.csh instead of iccvars.sh.
>>
>>
>>> # dpkg --search libimf.so
>>>   /opt/intel/fce/10.1.022/lib/libimf.so  (the same for cce)
>>>
>>> which path seems to be correctly sourced by iccvars.sh and
>>> ifortvars.sh (incidentally, both files are -rw-r--r-- root root;
>>> correct that non executable?)
>>>
>>
>> The permissions here are not a problem.
>> You are supposed to *source* the files, not to execute them.
>> If you execute them instead of sourcing the files,
>> your Intel environment will *NOT* be setup.
>>
>> BTW, the easy way to check your environment is to type "env" on the
>> shell command prompt.
>>
>>> echo $LD_LIBRARY_PATH
>>> returned, inter alia,
>>>
>>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib
>>> (why twice the mkl?)
>>>
>>
>> Hard to tell in which computer you were when you did this,
>> and hence what it should affect.
>>
>> You man have sourced twice the mkl shell that sets up the MKL environment
>> variables, which would write its library path more than
>> once.
>>
>> When the environment variables get this much confused,
>> with duplicate paths and so on, you may want to logout
>> and login again, to start fresh.
>>
>> Do you need MKL for Amber?
>> If you don't use it, keep things simple, and don't bother about it.
>>
>>
>>> I surely miss to understand something fundamental. Hope other eyes see
>>> better
>>>
>>
>> Jody helped you run the hello_c program successfully.
>> Try to repeat carefully the same steps.
>> You should get the same result,
>> the OpenMPI test programs should run.
>>
>>> A kind person elsewhere suggested me on passing "The use of -rpath
>>> during linking is highly recommended as opposed to setting
>>> LD_LIBRARY_PATH at run time, not the least because it hardcodes the
>>> paths to the "right" library files in the executables themselves"
>>> Should this be relevant to the present issue, where to learn about
>>> -rpath linking?
>>>
>>
>> If you are talking about Amber,
>> you would have to tweak the Makefiles to tweak the linker -rpath.
>> And we don't know much about Amber's Makefiles,
>> so this may be a very tricky approach.
>>
>> If you are talking about the OpenMPI test programs,
>> I think it is just a matter of setting the Intel environment variables
>> right, sourcing the ifortvars.sh iccvars.sh properly,
>> to get the right runtime LD_LIBRARY_PATH.
>>
>>> thanks
>>> francesco pietra
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> I hope this helps.
>> Gus Correa
>>
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>