Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] TIPC BTL Segmentation fault
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-07-04 07:40:25


Ah -- so this is in the template code. I suspect this code might have bit rotted a bit. :-\

If you run this through valgrind, does anything obvious show up? I ask because this kind of error is typically a symptom of the real error. I.e., the real error was some kind of memory corruption that occurred earlier, and this is the memory access that exposes that prior memory corruption.

On Jul 4, 2011, at 5:08 AM, Xin He wrote:

> Yes, it is a opal_object.
>
> And this error seems to be caused by these code:
>
> void mca_btl_template_proc_construct(mca_btl_template_proc_t* template_proc){
> .......
> .........
> /* add to list of all proc instance */
> OPAL_THREAD_LOCK(&mca_btl_template_component.template_lock);
> opal_list_append(&mca_btl_template_component.template_procs, &template_proc->super);
> OPAL_THREAD_UNLOCK(&mca_btl_template_component.template_lock);
> }
>
> /Xin
>
> On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote:
>> Do u know which object it is that is being constructed? When you compile with debugging enabled, theres strings in the object struct that identify te file and line where the obj was created.
>>
>> Sent from my phone. No type good.
>>
>> On Jun 29, 2011, at 8:48 AM, "Xin He"
>> <xin.i.he_at_[hidden]>
>> wrote:
>>
>>
>>> Hi,
>>>
>>> As I advanced in my implementation of TIPC BTL, I added the component and tried to run hello_c program to test.
>>>
>>> Then I got this segmentation fault. It seemed happening after the call "mca_btl_tipc_add_procs".
>>>
>>> The error message displayed:
>>>
>>> [oak:23192] *** Process received signal ***
>>> [oak:23192] Signal: Segmentation fault (11)
>>> [oak:23192] Signal code: (128)
>>> [oak:23192] Failing at address: (nil)
>>> [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40]
>>> [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10]
>>> [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2]
>>> [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2]
>>> [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a]
>>> [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386]
>>> [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0]
>>> [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb]
>>> [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60]
>>> [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51]
>>> [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33]
>>> [oak:23192] [11] hello_i(main+0x22) [0x400936]
>>> [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e]
>>> [oak:23192] [13] hello_i() [0x400859]
>>> [oak:23192] *** End of error message ***
>>>
>>> I used gdb to check the stack:
>>> (gdb) bt
>>> #0 0x00007ffff7afac10 in opal_obj_run_constructors (object=0x6ca980)
>>> at ../opal/class/opal_object.h:427
>>> #1 0x00007ffff7afb1f2 in opal_list_construct (list=0x6ca958) at class/opal_list.c:88
>>> #2 0x00007ffff2d479f2 in opal_obj_run_constructors (object=0x6ca958)
>>> at ../../../../opal/class/opal_object.h:427
>>> #3 0x00007ffff2d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0)
>>> at pml_ob1_comm.c:55
>>> #4 0x00007ffff2d44386 in opal_obj_run_constructors (object=0x6ca8c0)
>>> at ../../../../opal/class/opal_object.h:427
>>> #5 0x00007ffff2d444a0 in opal_obj_new (cls=0x7ffff2f6c040)
>>> at ../../../../opal/class/opal_object.h:477
>>> #6 0x00007ffff2d442fb in opal_obj_new_debug (type=0x7ffff2f6c040,
>>> file=0x7ffff2d62840 "pml_ob1.c", line=182)
>>> at ../../../../opal/class/opal_object.h:252
>>> #7 0x00007ffff2d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at pml_ob1.c:182
>>> #8 0x00007ffff797bf51 in ompi_mpi_init (argc=1, argv=0x7fffffffdf58, requested=0,
>>> provided=0x7fffffffde28) at runtime/ompi_mpi_init.c:770
>>> #9 0x00007ffff79acc33 in PMPI_Init (argc=0x7fffffffde5c, argv=0x7fffffffde50)
>>> at pinit.c:84
>>> #10 0x0000000000400936 in main (argc=1, argv=0x7fffffffdf58) at hello_c.c:17
>>>
>>> It seems the error happened when an object is constructed. Any idea why this is happening?
>>>
>>> Thanks.
>>>
>>> Best regards,
>>> Xin
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>>
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> _______________________________________________
>> devel mailing list
>>
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/