Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] After OS Update MPI_Init fails on one host
From: Kevin H. Hobbs (hobbsk_at_[hidden])
Date: 2013-07-20 09:09:41


On 07/19/2013 08:27 PM, Jeff Squyres (jsquyres) wrote:
> Not offhand. The error you're seeing *typically* indicates
> that you've got a mismatch of OMPI version somewhere.

So now the fun part for me is to try and find it or in failing to
find it eliminate the multiple versions theory.

> Are you running on multiple machines with different Open MPI
> versions, perchance?

Just one machine right now.

> If you're running only on a single machine, try completely
> uninstalling the Open MPI package, re-installing it,
> recompiling your trivial app, and see what happens.

That's easy enough :

"yum list openmpi*" says I have openmpi.x86_64,
openmpi-debuginfo.x86_64, and openmpi-devel.x86_64 installed.

I did :

  sudo yum remove \
    openmpi.x86_64 \
    openmpi-debuginfo.x86_64 \
    openmpi-devel.x86_64

followed by :

  sudo yum install \
    openmpi.x86_64 \
    openmpi-debuginfo.x86_64 \
    openmpi-devel.x86_64

Then I compiled and ran the program :

  mpicc -g -o mpi_simple mpi_simple.c
  mpirun -n 1 mpi_simple

and got the now familiar error.

> Also, you might want to check the output of "mpicc yourapp.c
> --showme" and see if it's pointing to the right libraries, etc.

  mpicc --showme -g -o mpi_simple mpi_simple.c
  gcc -g -o mpi_simple mpi_simple.c \
    -I/usr/include/openmpi-x86_64 -pthread -m64 \
    -L/usr/lib64/openmpi/lib -lmpi

Is anything hiding there that doesn't belong?

  find /usr/include/openmpi-x86_64/ \
    -exec rpm -q --whatprovides {} \; | sort -u

  openmpi-devel-1.6.3-7.fc18.x86_64

  find /usr/lib64/openmpi/lib \
    -exec rpm -q --whatprovides {} \; | sort -u

  openmpi-1.6.3-7.fc18.x86_64
  openmpi-devel-1.6.3-7.fc18.x86_64

What is the program actually linked to?

  ldd mpi_simple
    linux-vdso.so.1 => (0x00007fff34151000)
    libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1
(0x00007f079fa92000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003c53e00000)
    libc.so.6 => /lib64/libc.so.6 (0x0000003c53200000)
    libdl.so.2 => /lib64/libdl.so.2 (0x0000003c53a00000)
    librt.so.1 => /lib64/librt.so.1 (0x0000003c54200000)
    libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003c6c200000)
    libutil.so.1 => /lib64/libutil.so.1 (0x0000003c6de00000)
    libm.so.6 => /lib64/libm.so.6 (0x0000003c53600000)
    libhwloc.so.5 => /lib64/libhwloc.so.5 (0x0000003c57600000)
    libltdl.so.7 => /lib64/libltdl.so.7 (0x0000003c77000000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003c54a00000)
    /lib64/ld-linux-x86-64.so.2 (0x0000003c52e00000)
    libnuma.so.1 => /lib64/libnuma.so.1 (0x0000003c57200000)
    libpci.so.3 => /lib64/libpci.so.3 (0x0000003c55e00000)
    libxml2.so.2 => /lib64/libxml2.so.2 (0x0000003c5d600000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003c55a00000)
    libz.so.1 => /lib64/libz.so.1 (0x0000003c54600000)
    liblzma.so.5 => /lib64/liblzma.so.5 (0x0000003c59600000)

What packages provides them?

  rpm -q --whatprovides \
    /usr/lib64/openmpi/lib/libmpi.so.1 \
    /lib64/libpthread.so.0 \
    /lib64/libc.so.6 \
    /lib64/libdl.so.2 \
    /lib64/librt.so.1 \
    /lib64/libnsl.so.1 \
    /lib64/libutil.so.1 \
    /lib64/libm.so.6 \
    /lib64/libhwloc.so.5 \
    /lib64/libltdl.so.7 \
    /lib64/libgcc_s.so.1 \
    /lib64/libnuma.so.1 \
    /lib64/libpci.so.3 \
    /lib64/libxml2.so.2 \
    /lib64/libresolv.so.2 \
    /lib64/libz.so.1 \
    /lib64/liblzma.so.5 | sort -u

  glibc-2.16-33.fc18.x86_64
  hwloc-1.4.2-2.fc18.x86_64
  libgcc-4.7.2-8.fc18.x86_64
  libtool-ltdl-2.4.2-7.fc18.x86_64
  libxml2-2.9.1-1.fc18.1.x86_64
  numactl-libs-2.0.7-7.fc18.x86_64
  openmpi-1.6.3-7.fc18.x86_64
  pciutils-libs-3.1.10-2.fc18.x86_64
  xz-libs-5.1.2-2alpha.fc18.x86_64
  zlib-1.2.7-9.fc18.x86_64

I don't see any Fedora 17 stragglers or anything weird.