Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] SIGSEGV in OMPI 1.6.x
From: Yong Qin (yong.qin_at_[hidden])
Date: 2012-09-06 00:06:12


Hi,

While debugging a mysterious crash of a code, I was able to trace down
to a SIGSEGV in OMPI 1.6 and 1.6.1. The offending code is in
opal/mca/memory/linux/malloc.c. Please see the following gdb log.

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
opal_memory_ptmalloc2_int_free (av=0x2fd0637, mem=0x203a746f74512000)
at malloc.c:4385
4385 nextsize = chunksize(nextchunk);
(gdb) l
4380 Consolidate other non-mmapped chunks as they arrive.
4381 */
4382
4383 else if (!chunk_is_mmapped(p)) {
4384 nextchunk = chunk_at_offset(p, size);
4385 nextsize = chunksize(nextchunk);
4386 assert(nextsize > 0);
4387
4388 /* consolidate backward */
4389 if (!prev_inuse(p)) {
(gdb) bt
#0 opal_memory_ptmalloc2_int_free (av=0x2fd0637,
mem=0x203a746f74512000) at malloc.c:4385
#1 0x00002ae6b18ea0c0 in opal_memory_ptmalloc2_free (mem=0x2fd0637)
at malloc.c:3511
#2 0x00002ae6b18ea736 in opal_memory_linux_free_hook
(__ptr=0x2fd0637, caller=0x203a746f74512000) at hooks.c:705
#3 0x0000000001412fcc in for_dealloc_allocatable ()
#4 0x00000000007767b1 in ALLOC::dealloc_d2 (array=@0x2fd0647,
name=@0x6f6e6f69006f6e78, routine=Cannot access memory at address 0x0
) at alloc.F90:1357
#5 0x000000000082628c in M_LDAU::hubbard_term (scell=..., nua=@0xd5,
na=@0xd5, isa=..., xa=..., indxua=..., maxnh=@0xcf4ff, maxnd=@0xcf4ff,
lasto=..., iphorb=...,
    numd=..., listdptr=..., listd=..., numh=..., listhptr=...,
listh=..., nspin=@0xcf4ff00000002, dscf=..., eldau=@0x0, deldau=@0x0,
fa=..., stress=..., h=...,
    first=@0x0, last=@0x0) at ldau.F:752
#6 0x00000000006cd532 in M_SETUP_HAMILTONIAN::setup_hamiltonian
(first=@0x0, last=@0x0, iscf=@0x2) at setup_hamiltonian.F:199
#7 0x000000000070e257 in M_SIESTA_FORCES::siesta_forces
(istep=@0xf9a4d07000000000) at siesta_forces.F:90
#8 0x000000000070e475 in siesta () at siesta.F:23
#9 0x000000000045e47c in main ()

Can anybody shed some light here on what could be wrong?

Thanks,

Yong Qin