Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-08-29 09:18:10


I would guess the problem is that your memory restriction is causing a malloc failure based on this line:

> [pid 6796] mmap(NULL, 8560001024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

and we probably don't protect against that failure as well as we should. I doubt we would issue another 1.6 release for it, though.

On Aug 28, 2013, at 9:01 PM, Christopher Samuel <samuel_at_[hidden]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 28/08/13 19:36, Chris Samuel wrote:
>
>> With RHEL 6.4 gfortran it instead SEGV's straight away
>
> Using strace I can see a mmap(2) (called from malloc I presume)
> failing just before the SEGV.
>
> Process 6799 detached
> Process 6798 detached
> Hello, world, I am 0 of 1
> [pid 6796] mmap(NULL, 8560001024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
> [pid 6796] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> [barcoo:06796] *** Process received signal ***
> [barcoo:06796] Signal: Segmentation fault (11)
> [barcoo:06796] Signal code: Address not mapped (1)
> [barcoo:06796] Failing at address: 0x20078d708
> [pid 6796] mmap(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f75a5fed000
> [barcoo:06796] [ 0] /lib64/libpthread.so.0() [0x3f7b60f500]
> [barcoo:06796] [ 1] /usr/local/openmpi/1.6.5/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x982) [0x7f77a68c2dd2]
> [barcoo:06796] [ 2] /usr/local/openmpi/1.6.5/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x52) [0x7f77a68c3f42]
> [barcoo:06796] [ 3] ./gnumyhello_f90(MAIN__+0x146) [0x400f6a]
> [barcoo:06796] [ 4] ./gnumyhello_f90(main+0x2a) [0x4011ea]
> [barcoo:06796] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3f7b21ecdd]
> [barcoo:06796] [ 6] ./gnumyhello_f90() [0x400d69]
> [barcoo:06796] *** End of error message ***
> [pid 6796] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> [pid 6796] +++ killed by SIGSEGV (core dumped) +++
>
>
> The SEGV occurs (according to the gdb core dump I have) at the
> second set_head() call in this code:
>
> /* check that one of the above allocation paths succeeded */
> if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {
> remainder_size = size - nb;
> remainder = chunk_at_offset(p, nb);
> av->top = remainder;
> set_head(p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
> set_head(remainder, remainder_size | PREV_INUSE);
> check_malloced_chunk(av, p, nb);
> return chunk2mem(p);
> }
>
>
> The arguments to that function are:
>
> (gdb) print remainder
> $1 = (struct malloc_chunk *) 0x2008e5700
>
> (gdb) print remainder_size
> $2 = 0
>
> ANy ideas?
>
> cheers,
> Chris
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlIex30ACgkQO2KABBYQAh8HmQCgjj7tReOfdubczho7x9poprM7
> 5CwAnRBlw2LHrVHQsu2M1W6qo2H2HOzb
> =dasp
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel