Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] shared libraries issue compiling 1.3.1/intel10.1.022
From: Francesco Pietra (chiendarret_at_[hidden])
Date: 2009-04-13 12:07:03


I knew that but have considered it again. I wonder whether the info at
the end of this mail suggests how to operate from the viewpoint of
openmpi in compiling a code.

In trying to compile openmpi-1.3.1 on debian amd64 lenny, intels
10.1.022 do not see their librar libimf.so, which is on the unix path
as required by your reference. A mixed compilation gcc g++ ifort only
succeeded with a Tyan S2895, not with four-socket Supermicro boards,
which are of my need.

The problem was solved with gcc g++ gfortran. The openmpi-1.3.1
examples run correctly and Amber10 sander.MPI could be built plainly.

What remains unfulfilled - along similar lines - is the compilation of
Amber9 sander.MPI which I need. Installation of bison fulfilled the
request of yacc, and serial compilation passed.

The info alluded to above is:

"make clean" after serial compilation, ended with (between ======):
=======
Making `clean' in directory /usr/local/amber9/src/netcdf/src/cxx

make[3]: Entering directory `/usr/local/amber9/src/netcdf/src/cxx'
rm -f *.o *.a *.so *.sl *.i *.Z core nctst test.out example.nc *.cps
*.dvi *.fns *.log *~ *.gs *.aux *.cp *.fn *.ky *.pg *.toc *.tp *.vr
make[3]: Leaving directory `/usr/local/amber9/src/netcdf/src/cxx'

Returning to directory /usr/local/amber9/src/netcdf/src

make[2]: Leaving directory `/usr/local/amber9/src/netcdf/src'
rm -f *.o *.a *.so *.sl *.i *.Z core
make[1]: Leaving directory `/usr/local/amber9/src/netcdf/src'
cd netcdf/lib && rm -f libnetcdf.a
/bin/sh: line 0: cd: netcdf/lib: No such file or directory
make: [clean] Error 1 (ignored)
cd netcdf/include && rm -f *.mod
/bin/sh: line 0: cd: netcdf/include: No such file or directory
make: [clean] Error 1 (ignored)
========

./configure -openmpi gfortran
gave no error.

"make parallel" returned, in full (between xxxxxxxx)
xxxxxxxxxxx
Starting installation of Amber9 (parallel) at Mon Apr 13 17:36:19 CEST 2009.
cd sander; make parallel
make[1]: Entering directory `/usr/local/amber9/src/sander'
./checkparconf
cpp -traditional -I/usr/local/include -P -DMPI -xassembler-with-cpp
-Dsecond=ambsecond evb_vars.f > _evb_vars.f
gfortran -c -O0 -fno-second-underscore -march=nocona -ffree-form -o
evb_vars.o _evb_vars.f
cpp -traditional -I/usr/local/include -P -DMPI -xassembler-with-cpp
-Dsecond=ambsecond evb_input.f > _evb_input.f
gfortran -c -O0 -fno-second-underscore -march=nocona -ffree-form -o
evb_input.o _evb_input.f
cpp -traditional -I/usr/local/include -P -DMPI -xassembler-with-cpp
-Dsecond=ambsecond evb_init.f > _evb_init.f
gfortran -c -O0 -fno-second-underscore -march=nocona -ffree-form -o
evb_init.o _evb_init.f
Error: Can't open included file 'mpif-common.h'
_evb_init.f:372.67:

         call mpi_bcast ( xdat_dia(n)% filename, 512, MPI_CHARACTER, 0, commwor
                                                                  1
Error: Symbol 'mpi_character' at (1) has no IMPLICIT type
_evb_init.f:367.68:

         call mpi_bcast ( xdat_dia(n)% q, ndim, MPI_DOUBLE_PRECISION, 0, commwo
                                                                   1
Error: Symbol 'mpi_double_precision' at (1) has no IMPLICIT type
_evb_init.f:327.40:

   call mpi_bcast ( ndim, 1, MPI_INTEGER, 0, commworld, ierr )
                                       1
Error: Symbol 'mpi_integer' at (1) has no IMPLICIT type
make[1]: *** [evb_init.o] Error 1
make[1]: Leaving directory `/usr/local/amber9/src/sander'
make: *** [parallel] Error 2
xxxxxxxxxxxxxxxxxx

I can't apply to the amber site because they have declined interest in
adapting Amber9 to present software. Unfortunately I don't have two
sufficiently powerful computers for present and vintage status.

Thanks a lot for considering my mail

francesco pietra

On Fri, Apr 10, 2009 at 6:24 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> See this FAQ entry:
>
>    http://www.open-mpi.org/faq/?category=running#intel-compilers-static
>
>
>
> On Apr 10, 2009, at 12:16 PM, Francesco Pietra wrote:
>
>> Hi Gus:
>>
>> If you feel that the observations below are not relevant to openmpi,
>> please disregard the message. You have already kindly devoted so much
>> time to my problems.
>>
>> The "limits.h" issue is solved with 10.1.022 intel compilers: as I
>> felt, the problem was with the pre-10.1.021 version of the intel C++
>> and ifort compilers, a subtle bug observed also by gentoo people (web
>> intel). There remains an orted issue.
>>
>> The openmpi 1.3.1 installation was able to compile connectivity_c.c
>> and hello_c.c, though, running mpirun (output below between ===):
>>
>> =================
>> /usr/local/bin/mpirun -host -n 4 connectivity_c 2>&1 | tee
>> connectivity.out
>> /usr/local/bin/orted: error while loading shared libraries: libimf.so:
>> cannot open shared object file: No such file or directory
>> --------------------------------------------------------------------------
>> A daemon (pid 8472) died unexpectedly with status 127 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>> =============
>>
>> At this point, Amber10 serial compiled nicely (all intel, like
>> openmpi), but parallel compilation, as expected, returned the same
>> problem above:
>>
>> =================
>> export TESTsander=/usr/local/amber10/exe/sander.MPI; make
>> test.sander.BASIC
>> make[1]: Entering directory `/usr/local/amber10/test'
>> cd cytosine && ./Run.cytosine
>> orted: error while loading shared libraries: libimf.so: cannot open
>> shared object file: No such file or directory
>> --------------------------------------------------------------------------
>> A daemon (pid 8371) died unexpectedly with status 127 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>>  ./Run.cytosine:  Program error
>> make[1]: *** [test.sander.BASIC] Error 1
>> make[1]: Leaving directory `/usr/local/amber10/test'
>> make: *** [test.sander.BASIC.MPI] Error 2
>> =====================
>>
>> Relevant info:
>>
>> The daemon was not ssh (thus my hypothesis that a firewall on the
>> router was killing ssh is not the case). During these procedures,
>> there were only deb64 and deb32 on the local network. On monoprocessor
>> deb32 (i386) there is nothing of openmpi or amber. Only ssh. Thus, my
>> .bashrc on deb32 can't correspond to that of deb 64 as far as
>> libraries are concerned.
>>
>> echo $LD_LIBRARY_PATH
>>
>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1..022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib
>>
>> # dpkg --search libimf.so
>> intel-iforte101022: /opt/intel/fce/10.1.022/lib/libimf.so
>> intel-icce101022: /opt/intel/cce/10.1.022/lib/libimf.so
>>
>> i.e., libimf.so is on the unix path, still not found by mpirun.
>>
>> Before compiling I trie to carefully check all env variables and
>> paths. In particular, as to mpi:
>>
>> mpif90 -show /opt/intel/fce/10.1.022//bin/ifort -I/usr/local/include
>> -pthread -I/usr/local/lib -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi
>> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil
>>
>> thanks
>> francesco
>>
>>
>>
>> On Thu, Apr 9, 2009 at 9:29 PM, Gus Correa <gus_at_[hidden]> wrote:
>> > Hi Francesco
>> >
>> > Francesco Pietra wrote:
>> >>
>> >> Hi:
>> >> As failure to find "limits.h" in my attempted compilations of Amber of
>> >> th fast few days (amd64 lenny, openmpi 1.3.1, intel compilers
>> >> 10.1.015) is probably (or I hope so) a bug of the version used of
>> >> intel compilers (I made with debian the same observations reported
>> >> for gentoo,
>> >> http://software.intel.com/en-us/forums/intel-c-compiler/topic/59886/).
>> >>
>> >> I made a deb package of 10.1.022, icc and ifort.
>> >>
>> >> ./configure CC icc, CXX icp,
>> >
>> > The Intel C++ compiler is called icpc, not icp.
>> > Is this a typo on your message, or on the actual configure options?
>> >
>> > F77 and FC ifort --with-libnuma=/usr (not
>> >>
>> >> /usr/lib so that the numa.h issue is not raised), "make clean",
>> >
>> > If you really did "make clean" you may have removed useful things.
>> > What did you do, "make" or "make clean"?
>> >
>> > and
>> >>
>> >> "mak install" gave no error signals. However, the compilation tests in
>> >> the examples did not pass and I really don't understand why.
>> >>
>> >
>> > Which compilation tests you are talking about?
>> > From Amber or from the OpenMPI example programs (connectivity_c and
>> > hello_c), or both?
>> >
>> >> The error, with both connectivity_c and hello_c (I was operating on
>> >> the parallel computer deb64 directly and have access to everything
>> >> there) was failure to find a shared library, libimf.so
>> >>
>> >
>> > To get the right Intel environment,
>> > you need to put these commands inside your login files
>> > (.bashrc or .cshrc), to setup the Intel environment variables correctly:
>> >
>> > source /path/to/your/intel/cce/bin/iccvars.sh
>> > source /path/to/your/intel/cce/bin/ifortvars.sh
>> >
>> > and perhaps a similar one for mkl.
>> > (I don't use MKL, I don't know much about it).
>> >
>> > If your home directory is NFS mounted to all the computers you
>> > use to run parallel programs,
>> > then the same .bashrc/.csrhc will work on all computers.
>> > However, if you have a separate home directory on each computer,
>> > then you need to do this on each of them.
>> > I.e., you have to include the "source" commands above
>> > in the .bashrc/.cshrc files on your home directory in EACH computer.
>> >
>> > Also I presume you use bash/sh not tcsh/csh, right?
>> > Otherwise you need to source iccvars.csh instead of iccvars.sh.
>> >
>> >
>> >> # dpkg --search libimf.so
>> >>   /opt/intel/fce/10.1.022/lib/libimf.so  (the same for cce)
>> >>
>> >> which path seems to be correctly sourced by iccvars.sh and
>> >> ifortvars.sh (incidentally, both files are -rw-r--r-- root root;
>> >> correct that non executable?)
>> >>
>> >
>> > The permissions here are not a problem.
>> > You are supposed to *source* the files, not to execute them.
>> > If you execute them instead of sourcing the files,
>> > your Intel environment will *NOT* be setup.
>> >
>> > BTW, the easy way to check your environment is to type "env" on the
>> > shell command prompt.
>> >
>> >> echo $LD_LIBRARY_PATH
>> >> returned, inter alia,
>> >>
>> >>
>> >> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib
>> >> (why twice the mkl?)
>> >>
>> >
>> > Hard to tell in which computer you were when you did this,
>> > and hence what it should affect.
>> >
>> > You man have sourced twice the mkl shell that sets up the MKL
>> > environment
>> > variables, which would write its library path more than
>> > once.
>> >
>> > When the environment variables get this much confused,
>> > with duplicate paths and so on, you may want to logout
>> > and login again, to start fresh.
>> >
>> > Do you need MKL for Amber?
>> > If you don't use it, keep things simple, and don't bother about it.
>> >
>> >
>> >> I surely miss to understand something fundamental. Hope other eyes see
>> >> better
>> >>
>> >
>> > Jody helped you run the hello_c program successfully.
>> > Try to repeat carefully the same steps.
>> > You should get the same result,
>> > the OpenMPI test programs should run.
>> >
>> >> A kind person elsewhere suggested me on passing "The use of -rpath
>> >> during linking is highly recommended as opposed to setting
>> >> LD_LIBRARY_PATH at run time, not the least because it hardcodes the
>> >> paths to the "right" library files in the executables themselves"
>> >> Should this be relevant to the present issue, where to learn about
>> >> -rpath linking?
>> >>
>> >
>> > If you are talking about Amber,
>> > you would have to tweak the Makefiles to tweak the linker -rpath.
>> > And we don't know much about Amber's Makefiles,
>> > so this may be a very tricky approach.
>> >
>> > If you are talking about the OpenMPI test programs,
>> > I think it is just a matter of setting the Intel environment variables
>> > right, sourcing the ifortvars.sh iccvars.sh properly,
>> > to get the right runtime LD_LIBRARY_PATH.
>> >
>> >> thanks
>> >> francesco pietra
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > I hope this helps.
>> > Gus Correa
>> >
>> > ---------------------------------------------------------------------
>> > Gustavo Correa
>> > Lamont-Doherty Earth Observatory - Columbia University
>> > Palisades, NY, 10964-8000 - USA
>> > ---------------------------------------------------------------------
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>