Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2010-10-20 09:38:09


Hi,

I have built Open MPI 1.5 on Linux x86_64 with the Oracle/Sun Studio C
compiler. Unfortunately "mpiexec" breaks when I run a small propgram.

linpc4 small_prog 106 cc -V
cc: Sun C 5.10 Linux_i386 2009/06/03
usage: cc [ options] files. Use 'cc -flags' for details

linpc4 small_prog 107 uname -a
Linux linpc4 2.6.27.45-0.1-default #1 SMP 2010-02-22 16:49:47 +0100 x86_64
x86_64 x86_64 GNU/Linux

linpc4 small_prog 108 mpicc -show
cc -I/usr/local/openmpi-1.5_32_cc/include -mt
  -L/usr/local/openmpi-1.5_32_cc/lib -lmpi -ldl -Wl,--export-dynamic -lnsl
  -lutil -lm -ldl

linpc4 small_prog 109 mpicc -m32 rank_size.c
linpc4 small_prog 110 mpiexec -np 2 a.out
I'm process 0 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
I'm process 1 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
[linpc4:11564] *** Process received signal ***
[linpc4:11564] Signal: Segmentation fault (11)
[linpc4:11564] Signal code: (128)
[linpc4:11564] Failing at address: (nil)
[linpc4:11565] *** Process received signal ***
[linpc4:11565] Signal: Segmentation fault (11)
[linpc4:11565] Signal code: (128)
[linpc4:11565] Failing at address: (nil)
[linpc4:11564] [ 0] [0xffffe410]
[linpc4:11564] [ 1] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_base_components_close+0x8c) [0xf774ccd0]
[linpc4:11564] [ 2] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_btl_base_close+0xc5) [0xf76bd255]
[linpc4:11564] [ 3] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_bml_base_close+0x32) [0xf76bd112]
[linpc4:11564] [ 4] /usr/local/openmpi-1.5_32_cc/lib/openmpi/
  mca_pml_ob1.so [0xf73d971f]
[linpc4:11564] [ 5] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_base_components_close+0x8c) [0xf774ccd0]
[linpc4:11564] [ 6] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_pml_base_close+0xc1) [0xf76e4385]
[linpc4:11564] [ 7] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  [0xf76889e6]
[linpc4:11564] [ 8] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (PMPI_Finalize+0x3c) [0xf769dd4c]
[linpc4:11564] [ 9] a.out(main+0x98) [0x8048a18]
[linpc4:11564] [10] /lib/libc.so.6(__libc_start_main+0xe5) [0xf749c705]
[linpc4:11564] [11] a.out(_start+0x41) [0x8048861]
[linpc4:11564] *** End of error message ***
[linpc4:11565] [ 0] [0xffffe410]
[linpc4:11565] [ 1] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_base_components_close+0x8c) [0xf76bccd0]
[linpc4:11565] [ 2] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_btl_base_close+0xc5) [0xf762d255]
[linpc4:11565] [ 3] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_bml_base_close+0x32) [0xf762d112]
[linpc4:11565] [ 4] /usr/local/openmpi-1.5_32_cc/lib/openmpi/
  mca_pml_ob1.so [0xf734971f]
[linpc4:11565] [ 5] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_base_components_close+0x8c) [0xf76bccd0]
[linpc4:11565] [ 6] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (mca_pml_base_close+0xc1) [0xf7654385]
[linpc4:11565] [ 7] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  [0xf75f89e6]
[linpc4:11565] [ 8] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
  (PMPI_Finalize+0x3c) [0xf760dd4c]
[linpc4:11565] [ 9] a.out(main+0x98) [0x8048a18]
[linpc4:11565] [10] /lib/libc.so.6(__libc_start_main+0xe5) [0xf740c705]
[linpc4:11565] [11] a.out(_start+0x41) [0x8048861]
[linpc4:11565] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 11564 on node linpc4 exited
  on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
2 total processes killed (some possibly by mpiexec during cleanup)
linpc4 small_prog 111

"make check" shows that one test failed.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 114 grep FAIL
  log.make-check.Linux.x86_64.32_cc
FAIL: opal_path_nfs
linpc4 openmpi-1.5-Linux.x86_64.32_cc 115 grep PASS
  log.make-check.Linux.x86_64.32_cc
PASS: predefined_gap_test
PASS: dlopen_test
PASS: atomic_barrier
PASS: atomic_barrier_noinline
PASS: atomic_spinlock
PASS: atomic_spinlock_noinline
PASS: atomic_math
PASS: atomic_math_noinline
PASS: atomic_cmpset
PASS: atomic_cmpset_noinline
decode [PASSED]
PASS: opal_datatype_test
PASS: checksum
PASS: position
decode [PASSED]
PASS: ddt_test
decode [PASSED]
PASS: ddt_raw
linpc4 openmpi-1.5-Linux.x86_64.32_cc 116

I used the following command to build the package.

../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc \
  CFLAGS="-m32" CXXFLAGS="-m32" FFLAGS="-m32" FCFLAGS="-m32" \
  CXXLDFLAGS="-m32" CPPFLAGS="" \
  LDFLAGS="-m32" \
  C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
  OBJC_INCLUDE_PATH="" MPICHHOME="" \
  CC="cc" CXX="CC" F77="f95" FC="f95" \
  --without-udapl --with-threads=posix --enable-mpi-threads \
  --enable-shared --enable-heterogeneous --enable-cxx-exceptions \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.32_cc

I have also built the package with gcc-4.2.0 and it seems to work
although the nfs-test failed as well. Therefore I'm not sure if
the failing test is responsible for the failure with the cc-version.

../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_gcc \
  CFLAGS="-m32" CXXFLAGS="-m32" FFLAGS="-m32" FCFLAGS="-m32" \
  CXXLDFLAGS="-m32" CPPFLAGS="" \
  LDFLAGS="-m32" \
  C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
  OBJC_INCLUDE_PATH="" MPIHOME="" \
  CC="gcc" CPP="cpp" CXX="g++" CXXCPP="cpp" F77="gfortran" \
  --without-udapl --with-threads=posix --enable-mpi-threads \
  --enable-shared --enable-heterogeneous --enable-cxx-exceptions \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.32_gcc

linpc4 small_prog 107 gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.2.0/configure --prefix=/usr/local/gcc-4.2.0
  --enable-languages=c,c++,java,fortran,objc --enable-java-gc=boehm
  --enable-nls --enable-libgcj --enable-threads=posix
Thread model: posix
gcc version 4.2.0

linpc4 small_prog 109 mpicc -show
gcc -I/usr/local/openmpi-1.5_32_gcc/include -fexceptions -pthread
  -L/usr/local/openmpi-1.5_32_gcc/lib -lmpi -ldl -Wl,--export-dynamic
  -lnsl -lutil -lm -ldl

linpc4 small_prog 110 mpicc -m32 rank_size.c
linpc4 small_prog 111 mpiexec -np 2 a.out
I'm process 0 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
I'm process 1 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.

linpc4 small_prog 112 grep FAIL /.../log.make-check.Linux.x86_64.32_gcc
FAIL: opal_path_nfs
linpc4 small_prog 113 grep PASS /.../log.make-check.Linux.x86_64.32_gcc
PASS: predefined_gap_test
PASS: dlopen_test
PASS: atomic_barrier
PASS: atomic_barrier_noinline
PASS: atomic_spinlock
PASS: atomic_spinlock_noinline
PASS: atomic_math
PASS: atomic_math_noinline
PASS: atomic_cmpset
PASS: atomic_cmpset_noinline
decode [PASSED]
PASS: opal_datatype_test
PASS: checksum
PASS: position
decode [PASSED]
PASS: ddt_test
decode [NOT PASSED]
PASS: ddt_raw
linpc4 small_prog 114

I used the following small test program.

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main (int argc, char *argv[])
{
  int ntasks, /* number of parallel tasks */
       mytid, /* my task id */
       version, subversion, /* version of MPI standard */
       namelen; /* length of processor name */
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
  MPI_Get_processor_name (processor_name, &namelen);
  printf ("I'm process %d of %d available processes running on %s.\n",
          mytid, ntasks, processor_name);
  MPI_Get_version (&version, &subversion);
  printf ("MPI standard %d.%d is supported.\n", version, subversion);
  MPI_Finalize ();
  return EXIT_SUCCESS;
}

Thank you very much for any help to solve the problem with the
Oracle/Sun Compiler in advance.

Best regards

Siegmar