Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Gutierrez, Samuel K (samuel_at_[hidden])
Date: 2012-03-13 19:28:09


Can you rebuild without the "--enable-mpi-threads" option and try again.

Thanks,

Sam

On Mar 13, 2012, at 5:22 PM, Joshua Baker-LePain wrote:

> On Tue, 13 Mar 2012 at 10:57pm, Gutierrez, Samuel K wrote
>
>> Fooey. What compiler are you using to build Open MPI and how are you configuring your build?
>
> I'm using gcc as packaged by RH/CentOS 6.2:
>
> [jlb_at_opt200 1.4.5-2]$ gcc --version
> gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
>
> I actually tried 2 custom builds of Open MPI 1.4.5. For the first I tried to stick close to the options in RH's compat-openmpi SRPM:
>
> ./configure --prefix=$HOME/ompi-1.4.5 --enable-mpi-threads --enable-openib-ibcm --with-sge --with-libltdl=external --with-valgrind --enable-memchecker --with-psm=no --with-esmtp LDFLAGS='-Wl,-z,noexecstack'
>
> That resulted in the backtrace I sent previously:
> #0 0x00002b0099ec4c4c in mca_btl_sm_component_progress ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
> #1 0x00002b00967737ca in opal_progress ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
> #2 0x00002b00975ef8d5 in barrier ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
> #3 0x00002b009628da24 in ompi_mpi_init ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
> #4 0x00002b00962b24f0 in PMPI_Init ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
> #5 0x0000000000400826 in main (argc=1, argv=0x7fff9fe113f8)
> at mpihello-long.c:11
>
> For kicks, I tried a 2nd compile of 1.4.5 with a bare minimum of options:
>
> ./configure --prefix=$HOME/ompi-1.4.5 --with-sge
>
> That resulted in a slightly different backtrace that seems to be missing a bit:
> #0 0x00002b7bbc8681d0 in ?? ()
> #1 <signal handler called>
> #2 0x00002b7bbd2b8f6c in mca_btl_sm_component_progress ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_btl_sm.so
> #3 0x00002b7bb9b2feda in opal_progress ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/libopen-pal.so.0
> #4 0x00002b7bba9a98d5 in barrier ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/openmpi/mca_grpcomm_bad.so
> #5 0x00002b7bb965d426 in ompi_mpi_init ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
> #6 0x00002b7bb967cba0 in PMPI_Init ()
> from /netapp/sali/jlb/ompi-1.4.5/lib/libmpi.so.0
> #7 0x0000000000400826 in main (argc=1, argv=0x7fff93634788)
> at mpihello-long.c:11
>
>> Can you also run with a debug build of Open MPI so we can see the line numbers?
>
> I'll do that first thing tomorrow.
>
>>>> Another question. How reproducible is this on your system?
>>>
>>> In my testing today, it's been 100% reproducible.
>>
>> That's surprising.
>
> Heh. You're telling me.
>
> Thanks for taking an interest in this.
>
> --
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users