Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Different CC for orte and opmi?
From: Ashley Pittman (apittman_at_[hidden])
Date: 2008-06-10 07:28:44


Sorry, I'll try and fill in the background. I'm attempting to package
openmpi for a number of customers we have, whenever possible on our
clusters we use modules to provide users with a choice of MPI
environment.

I'm using the 1.2.6 stable release and have built the code twice, once
to /opt/openmpi-1.2.6/gnu and once to /opt/openmpi-1.2.6/intel, I have
create two modules environments called openmpi-gnu and openmpi-intel and
am also using a existing one called intel-compiler. The build was
successful in both cases.

If I load the openmpi-gnu module I can compile and run code using
mpicc/mpirun as expected, if I load openmpi-intel and intel-compiler I
find I can compile code but I get an error about missing libimf.so when
I try to run it (reproduced below).

The application *will* run if I add the line "module load
intel-compiler" to my bashrc as this allows orted to link. What I think
I want to do is to compile the actual library with icc but to compile
orted with gcc so that I don't need to load the intel environment by
default. I'm assuming that the link problems only exist with orted and
not with the actual application as the LD_LIBRARY_PATH is set correctly
in the shell which is launching the program.

Ashley Pittman.

sccomp_at_demo4-sles-10-1-fe:~/benchmarks/IMB_3.0/src> mpirun -H comp00,comp01 ./IMB-MPI1
/opt/openmpi-1.2.6/intel/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
/opt/openmpi-1.2.6/intel/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
[demo4-sles-10-1-fe:29303] ERROR: A daemon on node comp01 failed to start as expected.
[demo4-sles-10-1-fe:29303] ERROR: There may be more information available from
[demo4-sles-10-1-fe:29303] ERROR: the remote shell (see above).
[demo4-sles-10-1-fe:29303] ERROR: The daemon exited unexpectedly with status 127.
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
[demo4-sles-10-1-fe:29303] ERROR: A daemon on node comp00 failed to start as expected.
[demo4-sles-10-1-fe:29303] ERROR: There may be more information available from
[demo4-sles-10-1-fe:29303] ERROR: the remote shell (see above).
[demo4-sles-10-1-fe:29303] ERROR: The daemon exited unexpectedly with status 127.
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------

$ ldd /opt/openmpi-1.2.6/intel/bin/orted
        linux-vdso.so.1 => (0x00007fff877fe000)
        libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen-rte.so.0 (0x00007fe97f3ac000)
        libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen-pal.so.0 (0x00007fe97f239000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fe97f135000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fe97f01f000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007fe97ef1c000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fe97edc7000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe97ecba000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe97eba3000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fe97e972000)
        libimf.so => /opt/intel/compiler_10.1/x86_64/lib/libimf.so (0x00007fe97e610000)
        libsvml.so => /opt/intel/compiler_10.1/x86_64/lib/libsvml.so (0x00007fe97e489000)
        libintlc.so.5 => /opt/intel/compiler_10.1/x86_64/lib/libintlc.so.5 (0x00007fe97e350000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe97f525000)
$ ssh comp00 ldd /opt/openmpi-1.2.6/intel/bin/orted
        libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen-rte.so.0 (0x00002b1f0c0c5000)
        libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen-pal.so.0 (0x00002b1f0c23e000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b1f0c3bc000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00002b1f0c4c0000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00002b1f0c5d7000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b1f0c6da000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b1f0c82f000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b1f0c93d000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b1f0ca54000)
        /lib64/ld-linux-x86-64.so.2 (0x00002b1f0bfa9000)
        libimf.so => not found
        libsvml.so => not found
        libintlc.so.5 => not found
        libimf.so => not found
        libsvml.so => not found
        libintlc.so.5 => not found
$ ldd ./IMB-MPI1
        linux-vdso.so.1 => (0x00007fff2cbfe000)
        libmpi.so.0 => /opt/openmpi-1.2.6/intel/lib/libmpi.so.0 (0x00007f1624821000)
        libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen-rte.so.0 (0x00007f16246a8000)
        libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen-pal.so.0 (0x00007f1624535000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1624431000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f162431b000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f1624218000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f16240c3000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1623fb6000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1623e9f000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1623c6e000)
        libimf.so => /opt/intel/compiler_10.1/x86_64/lib/libimf.so (0x00007f162390c000)
        libsvml.so => /opt/intel/compiler_10.1/x86_64/lib/libsvml.so (0x00007f1623785000)
        libintlc.so.5 => /opt/intel/compiler_10.1/x86_64/lib/libintlc.so.5 (0x00007f162364c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f16249e0000)

On Mon, 2008-06-09 at 13:02 -0700, Doug Reeder wrote:
> Ashley,
>
> I am confused. In your first post you said orted fails, with link
> errors, when you try to launch a job. From this I inferred that the
> build and install steps for creating openmpi were successful. Was the
> build/install step successful. If so what dynamic libraries does ldd
> say that orted is using.
>
> Doug Reeder
> On Jun 9, 2008, at 12:54 PM, Ashley Pittman wrote:
>
> >
> > Putting to side any religious views I might have about static linking
> > how would that help in this case? It appears to be orted itself that
> > fails to link, I'm assuming that the application would actually run,
> > either because the LD_LIBRARY_PATH is set correctly on the front
> > end or
> > the --prefix option to mpirun.
> >
> > Or do you mean static linking of the tools? I could go for that if
> > there is a configure option for it.
> >
> > Ashley Pittman.
> >
> > On Mon, 2008-06-09 at 08:27 -0700, Doug Reeder wrote:
> >> Ashley,
> >>
> >> It could work but I think you would be better off to try and
> >> statically link the intel libraries.
> >>
> >> Doug Reeder
> >> On Jun 9, 2008, at 4:34 AM, Ashley Pittman wrote:
> >>
> >>>
> >>> Is there a way to use a different compiler for the orte component
> >>> and
> >>> the shared library component when using openmpi? We are finding
> >>> that if
> >>> we use icc to compile openmpi then orted fails with link errors
> >>> when I
> >>> try and launch a job as the intel environment isn't loaded by
> >>> default.
> >>>
> >>> We use the module command heavily and have modules for openmpi-
> >>> gnu and
> >>> openmpi-intel as well as a intel_compiler module. To use openmpi-
> >>> intel
> >>> we have to load intel_compiler by default on the compute nodes which
> >>> isn't ideal, is it possible to compile the orte component with
> >>> gcc and
> >>> the library component with icc?
> >>>
> >>> Yours,
> >>>
> >>> Ashley Pittman,
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users