Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 1.7.4rc2r30148 - crash in MPI_Init on Linux/x86
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-08 23:23:45


Ralph,

Building with gcc-4.1.2 fixed the problem for me. I also removed an old
install of ompi-1.4 that was in LD_LIBRARY_PATH at build time and might
have been a contributing factor. If I'd known earlier that it was there, I
wouldn't have reported the problem without first removing it.

I can build again with gcc-4.0.0 and --enable-debug if you are still
interested in trying to get a line number. This would also determine if
LD_LIBRARY_PATH was the true culprit.

-Paul [Sent from my phone]
On Jan 8, 2014 8:02 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:

> Most likely problem is a bad backing store site - any chance you could
> give me a line number from this? There are a lot of calls to register
> params in that code and I'd need some help in figuring out which one wasn't
> right.
>
>
> On Jan 8, 2014, at 6:59 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
> I am still testing the current 1.7.4rc tarball on my various systems. The
> latest failure (shown below) is a SEGV somewhere below MPI_Init on a old,
> but otherwise fairly normal, Linux/x86 (32-bit) system.
>
> $ /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/bin/mpirun
> -np 1 examples/ring_c
> [pcp-j-6:29031] *** Process received signal ***
> [pcp-j-6:29031] Signal: Segmentation fault (11)
> [pcp-j-6:29031] Signal code: Address not mapped (1)
> [pcp-j-6:29031] Failing at address: 0x6c6c6f63
> [pcp-j-6:29031] [ 0] [0xbe4440]
> [pcp-j-6:29031] [ 1]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
> [0x2b11ed]
> [pcp-j-6:29031] [ 2]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
> [0x440909]
> [pcp-j-6:29031] [ 3]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
> [0x2b2cce]
> [pcp-j-6:29031] [ 4]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
> [0x2b32a5]
> [pcp-j-6:29031] [ 5]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_open+0x4e)
> [0x2b333e]
> [pcp-j-6:29031] [ 6]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_mpi_init+0x53d)
> [0xaf359d]
> [pcp-j-6:29031] [ 7]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(MPI_Init+0x13d)
> [0xb10d6d]
> [pcp-j-6:29031] [ 8] examples/ring_c [0x80486e9]
> [pcp-j-6:29031] [ 9] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
> [pcp-j-6:29031] [10] examples/ring_c [0x8048631]
> [pcp-j-6:29031] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 29031 on node pcp-j-6 exited
> on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> The failure shown is for a singleton run, but np=2 fails as well.
>
> System info:
> $ uname -a
> Linux pcp-j-6 2.6.18-238.1.1.el5PAE #1 SMP Tue Jan 18 19:28:42 EST 2011
> i686 athlon i386 GNU/Linux
> $ gcc --version
> gcc (GCC) 4.0.0
> Copyright (C) 2005 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> The only configure argument used was --prefix.
>
> I was going to attach output from "ompi_info --all", but it SEGV's too!
>
> $ ompi_info --all
> [pcp-j-6:29092] *** Process received signal ***
> [pcp-j-6:29092] Signal: Segmentation fault (11)
> [pcp-j-6:29092] Signal code: Address not mapped (1)
> [pcp-j-6:29092] Failing at address: 0x6c6c6f63
> [pcp-j-6:29092] [ 0] [0xd8a440]
> [pcp-j-6:29092] [ 1]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
> [0x2db1ed]
> [pcp-j-6:29092] [ 2]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
> [0x48d909]
> [pcp-j-6:29092] [ 3]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
> [0x2dccce]
> [pcp-j-6:29092] [ 4]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
> [0x2dd2a5]
> [pcp-j-6:29092] [ 5]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(opal_info_register_project_frameworks+0x57)
> [0x2b83d7]
> [pcp-j-6:29092] [ 6]
> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_info_register_framework_params+0x81)
> [0xa69251]
> [pcp-j-6:29092] [ 7] ompi_info(main+0x2ba) [0x8049a2a]
> [pcp-j-6:29092] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
> [pcp-j-6:29092] [ 9] ompi_info [0x80496e1]
> [pcp-j-6:29092] *** End of error message ***
> Segmentation fault (core dumped)
>
> I will try again with a newer gcc and report back.
>
> -Paul
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>