Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] After OS Update MPI_Init fails on one host
From: Kevin H. Hobbs (hobbsk_at_[hidden])
Date: 2013-07-23 14:12:19


On 07/23/2013 09:54 AM, Jeff Squyres (jsquyres) wrote:
>
> I don't know if Fedora RPMs include -g in their builds, or if Fedora
> includes a debuginfo RPM that you could install such that you can attach
> a debugger and be able to dig into OMPI's internals yourself.
>

There is a debuginfo package.

Since I removed all of fedora's openmpi packages and installed from
source into /opt/openmpi-1.6.5 and /opt/openmpi-1.6.5_hwloc-1.4.3 to
narrow down on this problem, I now have to re-install the rpms with yum.

sudo yum install openmpi openmpi-devel openmpi-debuginfo

These don't put anything into my PATH or LD_LIBRARY_PATH so I have to :

module load mpi/openmpi-x86_64

I compiled my simple program with :

mpicc -g -o mpi_simple mpi_simple.c

The program links to fedora's copies of the libraries of interest :

mpirun -n 1 ldd mpi_simple | grep hwloc
  libhwloc.so.5 => /lib64/libhwloc.so.5 (0x0000003c57600000)
mpirun -n 1 ldd mpi_simple | grep mpi
  libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00007f7207e29000)

I started the debugger with :

mpirun -n 1 gdb mpi_simple

When run in the debugger I got the error I described.

I reran and in gdb did :

set breakpoint pending on
break util/nidmap.c:146
run
step

took me into 'opal_dss_unpack' Then I did 'next' until I got passed
'opal_dss_unpack_buffer' which returned the -1 we see outside.