Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Fwd: shared libraries issue compiling 1.3.1/intel 10.1.022
From: Francesco Pietra (chiendarret_at_[hidden])
Date: 2009-04-10 13:56:51


Hi Gus:
Please see below while I go to study what Jeff suggested,

On Fri, Apr 10, 2009 at 6:51 PM, Gus Correa <gus_at_[hidden]> wrote:
> Hi Francesco
>
> Let's concentrate on the Intel shared libraries problem for now.
>
> The FAQ Jeff sent you summarizes what I told you before.
>
> You need to setup your Intel environment (on deb64) to work with mpirun.
> You need to insert these commands on your .bashrc (most likely you use bash)
> or .cshrc (if you use csh/tcsh) file.
> These files sit on your home directory.
> They are hidden files, to see them do "ls -a".
> Edit this file and insert these commands there:
>
> source /path/to/your/intel/cce/bin/iccvars.sh
> source /path/to/your/intel/cce/bin/ifortvars.sh
>
> Did you do this?

my .bashrc contained

#For intel Fortran and C++ compilers

. /opt/intel/fce/10.1.022/bin/ifortvars.sh
. /opt/intel/cce/10.1.022/bin/iccvars.sh

echo $LD_LIBRARY_PATH
/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib

Because I understand that I am messing something, I saved a copy of
the original .bashrc and replaced the dot with "source". Of course,
everything came out as above,

I sincerely apologize to bother the list.

francesco

>
> This Intel environment **cannot** be setup on the
> shell command prompt **only**,
> otherwise it will **only work for your interactive session**,
> but **not** for mpirun.
>
> Edit your .bashrc file, and try to run connectivity_c again.
> We can talk about Amber after you get the Intel shared libraries problem
> behind you.
>
> (OK, I was about to say you forgot deb64 after -host,
> but you sent the fix below.)
>
> I hope this helps.
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
> Francesco Pietra wrote:
>>
>> Sorry, the first line of the ouput below (copied manually) should be rad
>>
>> /usr/local/bin/mpirun -host deb64 -n 4 connectivity_c 2>&1 | tee
>> connectivity.ou
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_[hidden]>
>> Date: Fri, Apr 10, 2009 at 6:16 PM
>> Subject: Re: [OMPI users] shared libraries issue compiling 1.3.1/intel
>> 10.1.022
>> To: Open MPI Users <users_at_[hidden]>
>>
>>
>> Hi Gus:
>>
>> If you feel that the observations below are not relevant to openmpi,
>> please disregard the message. You have already kindly devoted so much
>> time to my problems.
>>
>> The "limits.h" issue is solved with 10.1.022 intel compilers: as I
>> felt, the problem was with the pre-10.1.021 version of the intel C++
>> and ifort compilers, a subtle bug observed also by gentoo people (web
>> intel). There remains an orted issue.
>>
>> The openmpi 1.3.1 installation was able to compile connectivity_c.c
>> and hello_c.c, though, running mpirun (output below between ===):
>>
>> =================
>> /usr/local/bin/mpirun -host deb64 (see above) -n 4 connectivity_c 2>&1
>> | tee connectivity.out
>> /usr/local/bin/orted: error while loading shared libraries: libimf.so:
>> cannot open shared object file: No such file or directory
>> --------------------------------------------------------------------------
>> A daemon (pid 8472) died unexpectedly with status 127 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>> =============
>>
>> At this point, Amber10 serial compiled nicely (all intel, like
>> openmpi), but parallel compilation, as expected, returned the same
>> problem above:
>>
>> =================
>> export TESTsander=/usr/local/amber10/exe/sander.MPI; make
>> test.sander.BASIC
>> make[1]: Entering directory `/usr/local/amber10/test'
>> cd cytosine && ./Run.cytosine
>> orted: error while loading shared libraries: libimf.so: cannot open
>> shared object file: No such file or directory
>> --------------------------------------------------------------------------
>> A daemon (pid 8371) died unexpectedly with status 127 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>>  ./Run.cytosine:  Program error
>> make[1]: *** [test.sander.BASIC] Error 1
>> make[1]: Leaving directory `/usr/local/amber10/test'
>> make: *** [test.sander.BASIC.MPI] Error 2
>> =====================
>>
>> Relevant info:
>>
>> The daemon was not ssh (thus my hypothesis that a firewall on the
>> router was killing ssh is not the case). During these procedures,
>> there were only deb64 and deb32 on the local network. On monoprocessor
>> deb32 (i386) there is nothing of openmpi or amber. Only ssh. Thus, my
>> .bashrc on deb32 can't correspond to that of deb 64 as far as
>> libraries are concerned.
>>
>> echo $LD_LIBRARY_PATH
>>
>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1..022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib
>>
>> # dpkg --search libimf.so
>> intel-iforte101022: /opt/intel/fce/10.1.022/lib/libimf.so
>> intel-icce101022: /opt/intel/cce/10.1.022/lib/libimf.so
>>
>> i.e., libimf.so is on the unix path, still not found by mpirun.
>>
>> Before compiling I trie to carefully check all env variables and
>> paths. In particular, as to mpi:
>>
>> mpif90 -show /opt/intel/fce/10.1.022//bin/ifort -I/usr/local/include
>> -pthread -I/usr/local/lib -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi
>> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil
>>
>> thanks
>> francesco
>>
>>
>>
>> On Thu, Apr 9, 2009 at 9:29 PM, Gus Correa <gus_at_[hidden]> wrote:
>>>
>>> Hi Francesco
>>>
>>> Francesco Pietra wrote:
>>>>
>>>> Hi:
>>>> As failure to find "limits.h" in my attempted compilations of Amber of
>>>> th fast few days (amd64 lenny, openmpi 1.3.1, intel compilers
>>>> 10.1.015) is probably (or I hope so) a bug of the version used of
>>>> intel compilers (I made with debian the same observations reported
>>>> for gentoo,
>>>> http://software.intel.com/en-us/forums/intel-c-compiler/topic/59886/).
>>>>
>>>> I made a deb package of 10.1.022, icc and ifort.
>>>>
>>>> ./configure CC icc, CXX icp,
>>>
>>> The Intel C++ compiler is called icpc, not icp.
>>> Is this a typo on your message, or on the actual configure options?
>>>
>>> F77 and FC ifort --with-libnuma=/usr (not
>>>>
>>>> /usr/lib so that the numa.h issue is not raised), "make clean",
>>>
>>> If you really did "make clean" you may have removed useful things.
>>> What did you do, "make" or "make clean"?
>>>
>>> and
>>>>
>>>> "mak install" gave no error signals. However, the compilation tests in
>>>> the examples did not pass and I really don't understand why.
>>>>
>>> Which compilation tests you are talking about?
>>> From Amber or from the OpenMPI example programs (connectivity_c and
>>> hello_c), or both?
>>>
>>>> The error, with both connectivity_c and hello_c (I was operating on
>>>> the parallel computer deb64 directly and have access to everything
>>>> there) was failure to find a shared library, libimf.so
>>>>
>>> To get the right Intel environment,
>>> you need to put these commands inside your login files
>>> (.bashrc or .cshrc), to setup the Intel environment variables correctly:
>>>
>>> source /path/to/your/intel/cce/bin/iccvars.sh
>>> source /path/to/your/intel/cce/bin/ifortvars.sh
>>>
>>> and perhaps a similar one for mkl.
>>> (I don't use MKL, I don't know much about it).
>>>
>>> If your home directory is NFS mounted to all the computers you
>>> use to run parallel programs,
>>> then the same .bashrc/.csrhc will work on all computers.
>>> However, if you have a separate home directory on each computer,
>>> then you need to do this on each of them.
>>> I.e., you have to include the "source" commands above
>>> in the .bashrc/.cshrc files on your home directory in EACH computer.
>>>
>>> Also I presume you use bash/sh not tcsh/csh, right?
>>> Otherwise you need to source iccvars.csh instead of iccvars.sh.
>>>
>>>
>>>> # dpkg --search libimf.so
>>>>  /opt/intel/fce/10.1.022/lib/libimf.so  (the same for cce)
>>>>
>>>> which path seems to be correctly sourced by iccvars.sh and
>>>> ifortvars.sh (incidentally, both files are -rw-r--r-- root root;
>>>> correct that non executable?)
>>>>
>>> The permissions here are not a problem.
>>> You are supposed to *source* the files, not to execute them.
>>> If you execute them instead of sourcing the files,
>>> your Intel environment will *NOT* be setup.
>>>
>>> BTW, the easy way to check your environment is to type "env" on the
>>> shell command prompt.
>>>
>>>> echo $LD_LIBRARY_PATH
>>>> returned, inter alia,
>>>>
>>>>
>>>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib
>>>> (why twice the mkl?)
>>>>
>>> Hard to tell in which computer you were when you did this,
>>> and hence what it should affect.
>>>
>>> You man have sourced twice the mkl shell that sets up the MKL environment
>>> variables, which would write its library path more than
>>> once.
>>>
>>> When the environment variables get this much confused,
>>> with duplicate paths and so on, you may want to logout
>>> and login again, to start fresh.
>>>
>>> Do you need MKL for Amber?
>>> If you don't use it, keep things simple, and don't bother about it.
>>>
>>>
>>>> I surely miss to understand something fundamental. Hope other eyes see
>>>> better
>>>>
>>> Jody helped you run the hello_c program successfully.
>>> Try to repeat carefully the same steps.
>>> You should get the same result,
>>> the OpenMPI test programs should run.
>>>
>>>> A kind person elsewhere suggested me on passing "The use of -rpath
>>>> during linking is highly recommended as opposed to setting
>>>> LD_LIBRARY_PATH at run time, not the least because it hardcodes the
>>>> paths to the "right" library files in the executables themselves"
>>>> Should this be relevant to the present issue, where to learn about
>>>> -rpath linking?
>>>>
>>> If you are talking about Amber,
>>> you would have to tweak the Makefiles to tweak the linker -rpath.
>>> And we don't know much about Amber's Makefiles,
>>> so this may be a very tricky approach.
>>>
>>> If you are talking about the OpenMPI test programs,
>>> I think it is just a matter of setting the Intel environment variables
>>> right, sourcing the ifortvars.sh iccvars.sh properly,
>>> to get the right runtime LD_LIBRARY_PATH.
>>>
>>>> thanks
>>>> francesco pietra
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> I hope this helps.
>>> Gus Correa
>>>
>>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia University
>>> Palisades, NY, 10964-8000 - USA
>>> ---------------------------------------------------------------------
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>