Most likely problem is a bad backing store site - any chance you could give me a line number from this? There are a lot of calls to register params in that code and I'd need some help in figuring out which one wasn't right.


On Jan 8, 2014, at 6:59 PM, Paul Hargrove <phhargrove@lbl.gov> wrote:

I am still testing the current 1.7.4rc tarball on my various systems.  The latest failure (shown below) is a SEGV somewhere below MPI_Init on a old, but otherwise fairly normal, Linux/x86 (32-bit) system. 

$ /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/bin/mpirun -np 1 examples/ring_c
[pcp-j-6:29031] *** Process received signal ***
[pcp-j-6:29031] Signal: Segmentation fault (11)
[pcp-j-6:29031] Signal code: Address not mapped (1)
[pcp-j-6:29031] Failing at address: 0x6c6c6f63
[pcp-j-6:29031] [ 0] [0xbe4440]
[pcp-j-6:29031] [ 1] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d) [0x2b11ed]
[pcp-j-6:29031] [ 2] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639) [0x440909]
[pcp-j-6:29031] [ 3] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e) [0x2b2cce]
[pcp-j-6:29031] [ 4] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5) [0x2b32a5]
[pcp-j-6:29031] [ 5] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_open+0x4e) [0x2b333e]
[pcp-j-6:29031] [ 6] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_mpi_init+0x53d) [0xaf359d]
[pcp-j-6:29031] [ 7] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(MPI_Init+0x13d) [0xb10d6d]
[pcp-j-6:29031] [ 8] examples/ring_c [0x80486e9]
[pcp-j-6:29031] [ 9] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
[pcp-j-6:29031] [10] examples/ring_c [0x8048631]
[pcp-j-6:29031] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 29031 on node pcp-j-6 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

The failure shown is for a singleton run, but np=2 fails as well.

System info:
$ uname -a
Linux pcp-j-6 2.6.18-238.1.1.el5PAE #1 SMP Tue Jan 18 19:28:42 EST 2011 i686 athlon i386 GNU/Linux
$ gcc --version
gcc (GCC) 4.0.0
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The only configure argument used was --prefix.

I was going to attach output from "ompi_info --all", but it SEGV's too!

$ ompi_info --all 
[pcp-j-6:29092] *** Process received signal ***
[pcp-j-6:29092] Signal: Segmentation fault (11)
[pcp-j-6:29092] Signal code: Address not mapped (1)
[pcp-j-6:29092] Failing at address: 0x6c6c6f63
[pcp-j-6:29092] [ 0] [0xd8a440]
[pcp-j-6:29092] [ 1] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d) [0x2db1ed]
[pcp-j-6:29092] [ 2] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639) [0x48d909]
[pcp-j-6:29092] [ 3] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e) [0x2dccce]
[pcp-j-6:29092] [ 4] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5) [0x2dd2a5]
[pcp-j-6:29092] [ 5] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(opal_info_register_project_frameworks+0x57) [0x2b83d7]
[pcp-j-6:29092] [ 6] /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_info_register_framework_params+0x81) [0xa69251]
[pcp-j-6:29092] [ 7] ompi_info(main+0x2ba) [0x8049a2a]
[pcp-j-6:29092] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
[pcp-j-6:29092] [ 9] ompi_info [0x80496e1]
[pcp-j-6:29092] *** End of error message ***
Segmentation fault (core dumped)

I will try again with a newer gcc and report back.

-Paul

--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel