Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.4rc2r30148 - crash in MPI_Init on Linux/x86
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-08 23:45:53


Only takes <30 seconds of typing to start the test and I get email when it
is done.
Typing these emails takes more of my time than the actual testing does.

-Paul

On Wed, Jan 8, 2014 at 8:35 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> If you have the time, it might be worth nailing it down. However, I'm
> mindful of all the things you need to do, so please only if you have the
> time.
>
> Thanks
> Ralph
>
> On Jan 8, 2014, at 8:23 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
> Ralph,
>
> Building with gcc-4.1.2 fixed the problem for me. I also removed an old
> install of ompi-1.4 that was in LD_LIBRARY_PATH at build time and might
> have been a contributing factor. If I'd known earlier that it was there, I
> wouldn't have reported the problem without first removing it.
>
> I can build again with gcc-4.0.0 and --enable-debug if you are still
> interested in trying to get a line number. This would also determine if
> LD_LIBRARY_PATH was the true culprit.
>
> -Paul [Sent from my phone]
> On Jan 8, 2014 8:02 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>
>> Most likely problem is a bad backing store site - any chance you could
>> give me a line number from this? There are a lot of calls to register
>> params in that code and I'd need some help in figuring out which one wasn't
>> right.
>>
>>
>> On Jan 8, 2014, at 6:59 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>
>> I am still testing the current 1.7.4rc tarball on my various systems.
>> The latest failure (shown below) is a SEGV somewhere below MPI_Init on a
>> old, but otherwise fairly normal, Linux/x86 (32-bit) system.
>>
>> $ /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/bin/mpirun
>> -np 1 examples/ring_c
>> [pcp-j-6:29031] *** Process received signal ***
>> [pcp-j-6:29031] Signal: Segmentation fault (11)
>> [pcp-j-6:29031] Signal code: Address not mapped (1)
>> [pcp-j-6:29031] Failing at address: 0x6c6c6f63
>> [pcp-j-6:29031] [ 0] [0xbe4440]
>> [pcp-j-6:29031] [ 1]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
>> [0x2b11ed]
>> [pcp-j-6:29031] [ 2]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
>> [0x440909]
>> [pcp-j-6:29031] [ 3]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
>> [0x2b2cce]
>> [pcp-j-6:29031] [ 4]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
>> [0x2b32a5]
>> [pcp-j-6:29031] [ 5]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_open+0x4e)
>> [0x2b333e]
>> [pcp-j-6:29031] [ 6]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_mpi_init+0x53d)
>> [0xaf359d]
>> [pcp-j-6:29031] [ 7]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(MPI_Init+0x13d)
>> [0xb10d6d]
>> [pcp-j-6:29031] [ 8] examples/ring_c [0x80486e9]
>> [pcp-j-6:29031] [ 9] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
>> [pcp-j-6:29031] [10] examples/ring_c [0x8048631]
>> [pcp-j-6:29031] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 29031 on node pcp-j-6 exited
>> on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>>
>> The failure shown is for a singleton run, but np=2 fails as well.
>>
>> System info:
>> $ uname -a
>> Linux pcp-j-6 2.6.18-238.1.1.el5PAE #1 SMP Tue Jan 18 19:28:42 EST 2011
>> i686 athlon i386 GNU/Linux
>> $ gcc --version
>> gcc (GCC) 4.0.0
>> Copyright (C) 2005 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>>
>> The only configure argument used was --prefix.
>>
>> I was going to attach output from "ompi_info --all", but it SEGV's too!
>>
>> $ ompi_info --all
>> [pcp-j-6:29092] *** Process received signal ***
>> [pcp-j-6:29092] Signal: Segmentation fault (11)
>> [pcp-j-6:29092] Signal code: Address not mapped (1)
>> [pcp-j-6:29092] Failing at address: 0x6c6c6f63
>> [pcp-j-6:29092] [ 0] [0xd8a440]
>> [pcp-j-6:29092] [ 1]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_var_enum_create+0x15d)
>> [0x2db1ed]
>> [pcp-j-6:29092] [ 2]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/openmpi/mca_coll_ml.so(mca_coll_ml_register_params+0x639)
>> [0x48d909]
>> [pcp-j-6:29092] [ 3]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_components_register+0x14e)
>> [0x2dccce]
>> [pcp-j-6:29092] [ 4]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(mca_base_framework_register+0x1b5)
>> [0x2dd2a5]
>> [pcp-j-6:29092] [ 5]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libopen-pal.so.6(opal_info_register_project_frameworks+0x57)
>> [0x2b83d7]
>> [pcp-j-6:29092] [ 6]
>> /home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/INST/lib/libmpi.so.1(ompi_info_register_framework_params+0x81)
>> [0xa69251]
>> [pcp-j-6:29092] [ 7] ompi_info(main+0x2ba) [0x8049a2a]
>> [pcp-j-6:29092] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0x125ebc]
>> [pcp-j-6:29092] [ 9] ompi_info [0x80496e1]
>> [pcp-j-6:29092] *** End of error message ***
>> Segmentation fault (core dumped)
>>
>> I will try again with a newer gcc and report back.
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove PHHargrove_at_[hidden]
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900