Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] 1.7.4rc2r30148 - crash in MPI_Init on Linux/x86
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-08 21:59:24


I am still testing the current 1.7.4rc tarball on my various systems. The
latest failure (shown below) is a SEGV somewhere below MPI_Init on a old,
but otherwise fairly normal, Linux/x86 (32-bit) system.

$ /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/bin/mpirun -np
1 examples/ring_c
[pcp-j-6:29031] *** Process received signal ***
[pcp-j-6:29031] Signal: Segmentation fault (11)
[pcp-j-6:29031] Signal code: Address not mapped (1)
[pcp-j-6:29031] Failing at address: 0x6c6c6f63
[pcp-j-6:29031] [ 0] [0xbe4440]
[pcp-j-6:29031] [ 1]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
[0x2b11ed]
[pcp-j-6:29031] [ 2]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
[0x440909]
[pcp-j-6:29031] [ 3]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
[0x2b2cce]
[pcp-j-6:29031] [ 4]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
[0x2b32a5]
[pcp-j-6:29031] [ 5]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_open+0x4e)
[0x2b333e]
[pcp-j-6:29031] [ 6]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_mpi_init+0x53d)
[0xaf359d]
[pcp-j-6:29031] [ 7]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(MPI_Init+0x13d)
[0xb10d6d]
[pcp-j-6:29031] [ 8] examples/ring_c [0x80486e9]
[pcp-j-6:29031] [ 9] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
[pcp-j-6:29031] [10] examples/ring_c [0x8048631]
[pcp-j-6:29031] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 29031 on node pcp-j-6 exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

The failure shown is for a singleton run, but np=2 fails as well.

System info:
$ uname -a
Linux pcp-j-6 2.6.18-238.1.1.el5PAE #1 SMP Tue Jan 18 19:28:42 EST 2011
i686 athlon i386 GNU/Linux
$ gcc --version
gcc (GCC) 4.0.0
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The only configure argument used was --prefix.

I was going to attach output from "ompi_info --all", but it SEGV's too!

$ ompi_info --all
[pcp-j-6:29092] *** Process received signal ***
[pcp-j-6:29092] Signal: Segmentation fault (11)
[pcp-j-6:29092] Signal code: Address not mapped (1)
[pcp-j-6:29092] Failing at address: 0x6c6c6f63
[pcp-j-6:29092] [ 0] [0xd8a440]
[pcp-j-6:29092] [ 1]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
[0x2db1ed]
[pcp-j-6:29092] [ 2]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
[0x48d909]
[pcp-j-6:29092] [ 3]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
[0x2dccce]
[pcp-j-6:29092] [ 4]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
[0x2dd2a5]
[pcp-j-6:29092] [ 5]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(opal_info_register_project_frameworks+0x57)
[0x2b83d7]
[pcp-j-6:29092] [ 6]
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_info_register_framework_params+0x81)
[0xa69251]
[pcp-j-6:29092] [ 7] ompi_info(main+0x2ba) [0x8049a2a]
[pcp-j-6:29092] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
[pcp-j-6:29092] [ 9] ompi_info [0x80496e1]
[pcp-j-6:29092] *** End of error message ***
Segmentation fault (core dumped)

I will try again with a newer gcc and report back.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900