Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] v1.7.5a1: mpirun failure on ppc/linux (regression vs 1.7.4)
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-02-09 03:32:13


I have tried building the current v1.7 tarball (1.7.5a1r30639) with gcc on
two ppc64/linux machines and one ppc32/linux. All three die in MPI_Init
when I try to run ring_c.

I've retested 1.7.4 on both ppc64 machines, and thankfully the problem is
not present.

Each of them at least dies with what looks like a potentially useful
backtrace:

$ mpirun -mca btl sm,self -np 2 examples/ring_c
*** glibc detected *** examples/ring_c: double free or corruption
(fasttop): 0x000001003f1d5ce0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x80c9b8f4f4]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(mca_btl_sm_add_procs-0x2db2c8)[0xfffa29e59a8]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(+0x2311bc)[0xfffa29711bc]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(mca_pml_ob1_add_procs-0x14f514)[0xfffa2b7df3c]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(ompi_mpi_init-0x421ff0)[0xfffa28911f0]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(MPI_Init-0x3d7120)[0xfffa28df670]
examples/ring_c[0x100009ac]
/lib64/libc.so.6[0x80c9b2bcd8]
/lib64/libc.so.6(__libc_start_main-0x184e00)[0x80c9b2bed0]
[bd-login:51140] WARNING: common_sm_module_unlink failed.
[bd-login:51140] WARNING: common_sm_module_unlink failed.
[bd-login:51140] WARNING: unlink failed.
[bd-login:51140] WARNING: unlink failed.
[bd-login:51141] *** Process received signal ***
[bd-login:51141] Signal: Aborted (6)
[bd-login:51141] Signal code: (-6)
[bd-login:51141] [ 0] [0xfffa2da0418]
[bd-login:51141] [ 1] /lib64/libc.so.6(abort-0x16b278)[0x80c9b46ed8]
[bd-login:51141] [ 2] /lib64/libc.so.6[0x80c9b87568]
[bd-login:51141] [ 3] /lib64/libc.so.6[0x80c9b8f4f4]
[bd-login:51141] [ 4]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(mca_btl_sm_add_procs-0x2db2c8)[0xfffa29e59a8]
[bd-login:51141] [ 5]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(+0x2311bc)[0xfffa29711bc]
[bd-login:51141] [ 6]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(mca_pml_ob1_add_procs-0x14f514)[0xfffa2b7df3c]
[bd-login:51141] [ 7]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(ompi_mpi_init-0x421ff0)[0xfffa28911f0]
[bd-login:51141] [ 8]
/home/hargrov1/OMPI/openmpi-1.7-latest-linux-ppc64-gcc/INST/lib/libmpi.so.1(MPI_Init-0x3d7120)[0xfffa28df670]
[bd-login:51141] [ 9] examples/ring_c[0x100009ac]
[bd-login:51141] [10] /lib64/libc.so.6[0x80c9b2bcd8]
[bd-login:51141] [11]
/lib64/libc.so.6(__libc_start_main-0x184e00)[0x80c9b2bed0]
[bd-login:51141] *** End of error message ***

$ mpirun -mca btl sm,self -np 2 examples/ring_c
[fc6:27829] *** Process received signal ***
[fc6:27829] Signal: Segmentation fault (11)
[fc6:27829] Signal code: Address not mapped (1)
[fc6:27829] Failing at address: 0x805aa7c9e0
[fc6:27829] [ 0] [0x100428]
[fc6:27829] [ 1] /lib64/ld64.so.1(_rtld_global+0x0)[0x804a7d19b8]
[fc6:27829] [ 2] /lib64/libc.so.6[0x804a888f34]
[fc6:27829] [ 3] /lib64/libc.so.6(__libc_malloc-0xf95c4)[0x804a88aab4]
[fc6:27829] [ 4]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc64/INST/lib/libmpi.so.1(mca_pml_ob1_comm_init_size-0x124550)[0x40000393078]
[fc6:27829] [ 5]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc64/INST/lib/libmpi.so.1(mca_pml_ob1_add_comm-0x1276dc)[0x4000038fc1c]
[fc6:27829] [ 6]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc64/INST/lib/libmpi.so.1(ompi_mpi_init-0x36fb38)[0x40000130c30]
[fc6:27829] [ 7]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc64/INST/lib/libmpi.so.1(MPI_Init-0x3289fc)[0x4000017b34c]
[fc6:27829] [ 8] examples/ring_c[0x100009b0]
[fc6:27829] [ 9] /lib64/libc.so.6[0x804a829734]
[fc6:27829] [10] /lib64/libc.so.6(__libc_start_main-0x15730c)[0x804a8299b4]
[fc6:27829] *** End of error message ***

$ mpirun -mca btl sm,self -np 2 examples/ring_c
*** glibc detected *** examples/ring_c: double free or corruption (!prev):
0x101b5560 ***
======= Backtrace: =========
/lib/libc.so.6(+0xfe74dd4)[0x480d0dd4]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(mca_btl_sm_add_procs+0x66c)[0xfc720c0]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(+0x15f7a0)[0xfc5e7a0]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(mca_pml_ob1_add_procs+0x14c)[0xfdecc50]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(ompi_mpi_init+0xcec)[0xfb96eb4]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(MPI_Init+0x1e4)[0xfbdd878]
examples/ring_c[0x10000724]
/lib/libc.so.6(+0xfe0e0fc)[0x4806a0fc]
/lib/libc.so.6(+0xfe0e2a0)[0x4806a2a0]
[pcp-k-421:02574] *** Process received signal ***
[pcp-k-421:02574] Signal: Aborted (6)
[pcp-k-421:02574] Signal code: (-6)
[pcp-k-421:02574] [ 0] [0x100370]
[pcp-k-421:02574] [ 1] [0xbfd3a008]
[pcp-k-421:02574] [ 2] /lib/libc.so.6(abort+0x25c)[0x48084a2c]
[pcp-k-421:02574] [ 3] /lib/libc.so.6(+0xfe6cc9c)[0x480c8c9c]
[pcp-k-421:02574] [ 4] /lib/libc.so.6(+0xfe74dd4)[0x480d0dd4]
[pcp-k-421:02574] [ 5]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(mca_btl_sm_add_
procs+0x66c)[0xfc720c0]
[pcp-k-421:02574] [ 6]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(+0x15f7a0)[0xfc5e7a0]
[pcp-k-421:02574] [ 7]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(mca_pml_ob1_add_procs+0x14c)[0xfdecc50]

[pcp-k-421:02574] [ 8]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(ompi_mpi_init+0xcec)[0xfb96eb4]
[pcp-k-421:02574] [ 9]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(MPI_Init+0x1e4)[0xfbdd878]
[pcp-k-421:02574] [10] examples/ring_c[0x10000724]
[pcp-k-421:02574] [11] /lib/libc.so.6(+0xfe0e0fc)[0x4806a0fc]
[pcp-k-421.n2001:02573] WARNING: common_sm_module_unlink failed.
[pcp-k-421:02574] [12] /lib/libc.so.6(+0xfe0e2a0)[0x4806a2a0]
[pcp-k-421:02574] *** End of error message ***
[pcp-k-421.n2001:02573] WARNING: common_sm_module_unlink failed.
[pcp-k-421.n2001:02573] WARNING: unlink failed.
[pcp-k-421.n2001:02573] WARNING: A system call failed during shared memory
initialization that should unlink failed.
[pcp-k-421:02573] *** Process received signal ***
[pcp-k-421:02573] Signal: Segmentation fault (11)
[pcp-k-421:02573] Signal code: Address not mapped (1)
[pcp-k-421:02573] Failing at address: 0x3f000008
[pcp-k-421:02573] [ 0] [0x100370]
[pcp-k-421:02573] [ 1] [0xffffffff]
[pcp-k-421:02573] [ 2] /lib/libc.so.6(__libc_malloc+0x8c)[0x480d525c]
[pcp-k-421:02573] [ 3]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(+0x2eb7f8)[0xfdea7f8]
[pcp-k-421:02573] [ 4]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(+0x2eb5a0)[0xfdea5a0]
[pcp-k-421:02573] [ 5]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(mca_pml_ob1_add_comm+0x40)[0xfdec108]
[pcp-k-421:02573] [ 6]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(ompi_mpi_init+0xd6c)[0xfb96f34]
[pcp-k-421:02573] [ 7]
/home/phargrov/OMPI/openmpi-1.7-latest-linux-ppc32/INST/lib/libmpi.so.1(MPI_Init+0x1e4)[0xfbdd878]
[pcp-k-421:02573] [ 8] examples/ring_c[0x10000724]
[pcp-k-421:02573] [ 9] /lib/libc.so.6(+0xfe0e0fc)[0x4806a0fc]
[pcp-k-421:02573] [10] /lib/libc.so.6(+0xfe0e2a0)[0x4806a2a0]
[pcp-k-421:02573] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 2574 on node pcp-k-421 exited
on signal 6 (Aborted).
--------------------------------------------------------------------------

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900