Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-03-13 20:05:56


I started playing with this configure line on my Centos6 machine, and I'd suggest a couple of things:

1. drop the --with-libltdl=external ==> not a good idea

2. drop --with-esmtp ==> useless unless you really want pager messages notifying you of problems

3. drop --enable-mpi-threads for now

I'm continuing to play with it, but thought I'd pass those along.

On Mar 13, 2012, at 5:28 PM, Gutierrez, Samuel K wrote:

> Can you rebuild without the "--enable-mpi-threads" option and try again.
>
> Thanks,
>
> Sam
>
> On Mar 13, 2012, at 5:22 PM, Joshua Baker-LePain wrote:
>
>> On Tue, 13 Mar 2012 at 10:57pm, Gutierrez, Samuel K wrote
>>
>>> Fooey. What compiler are you using to build Open MPI and how are you configuring your build?
>>
>> I'm using gcc as packaged by RH/CentOS 6.2:
>>
>> [jlb_at_opt200 1.4.5-2]$ gcc --version
>> gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
>>
>> I actually tried 2 custom builds of Open MPI 1.4.5. For the first I tried to stick close to the options in RH's compat-openmpi SRPM:
>>
>> ./configure --prefix=$HOME/ompi-1.4.5 --enable-mpi-threads --enable-openib-ibcm --with-sge --with-libltdl=external --with-valgrind --enable-memchecker --with-psm=no --with-esmtp LDFLAGS='-Wl,-z,noexecstack'
>>
>> That resulted in the backtrace I sent previously:
>> #0 0x00002b0099ec4c4c in mca_btl_sm_component_progress ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
>> #1 0x00002b00967737ca in opal_progress ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
>> #2 0x00002b00975ef8d5 in barrier ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
>> #3 0x00002b009628da24 in ompi_mpi_init ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
>> #4 0x00002b00962b24f0 in PMPI_Init ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
>> #5 0x0000000000400826 in main (argc=1, argv=0x7fff9fe113f8)
>> at mpihello-long.c:11
>>
>> For kicks, I tried a 2nd compile of 1.4.5 with a bare minimum of options:
>>
>> ./configure --prefix=$HOME/ompi-1.4.5 --with-sge
>>
>> That resulted in a slightly different backtrace that seems to be missing a bit:
>> #0 0x00002b7bbc8681d0 in ?? ()
>> #1 <signal handler called>
>> #2 0x00002b7bbd2b8f6c in mca_btl_sm_component_progress ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
>> #3 0x00002b7bb9b2feda in opal_progress ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
>> #4 0x00002b7bba9a98d5 in barrier ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
>> #5 0x00002b7bb965d426 in ompi_mpi_init ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
>> #6 0x00002b7bb967cba0 in PMPI_Init ()
>> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
>> #7 0x0000000000400826 in main (argc=1, argv=0x7fff93634788)
>> at mpihello-long.c:11
>>
>>> Can you also run with a debug build of Open MPI so we can see the line numbers?
>>
>> I'll do that first thing tomorrow.
>>
>>>>> Another question. How reproducible is this on your system?
>>>>
>>>> In my testing today, it's been 100% reproducible.
>>>
>>> That's surprising.
>>
>> Heh. You're telling me.
>>
>> Thanks for taking an interest in this.
>>
>> --
>> Joshua Baker-LePain
>> QB3 Shared Cluster Sysadmin
>> UCSF
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users