Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-09-03 14:47:38


Hmm. Are you building Open MPI in a special way? I ask because I'm unable to replicate the issue -- I've run your test (and a C equivalent) a few hundred times now:

----
[jsquyres_at_savbu-usnic-a mpi]$ which gfortran
/usr/bin/gfortran
[jsquyres_at_savbu-usnic-a mpi]$ gfortran --version
GNU Fortran (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
Copyright (C) 2010 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
[jsquyres_at_savbu-usnic-a mpi]$ mpifort gnumyhello_f90.f90 -o gnumyhello_f90
[jsquyres_at_savbu-usnic-a mpi]$ mpicc gnumyhello.c -o gnumyhello
[jsquyres_at_savbu-usnic-a mpi]$ ulimit -v 1048576
[jsquyres_at_savbu-usnic-a mpi]$ ./gnumyhello
Hello, world, I am 0 of 1
Failed to allocate
[jsquyres_at_savbu-usnic-a mpi]$ ./gnumyhello_f90
 Hello, world, I am            0  of            1
 Task            0  failed to allocate    7.9721212387084961      GB
[jsquyres_at_savbu-usnic-a mpi]$
-----
No segvs, no core files, etc.
On Sep 2, 2013, at 2:51 AM, Christopher Samuel <samuel_at_[hidden]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/09/13 16:32, Christopher Samuel wrote:
> 
>> I cannot duplicate this under valgrind or gdb and given that this
>> doesn't happen every time I run it and gdb indicates there are at
>> least 2 threads running then we're wondering if this is a race condition.
> 
> I have also duplicated this problem with 1.7.3a1r29103.
> 
> Hello, world, I am            0  of            1
> [barcoo:03306] *** Process received signal ***
> [barcoo:03306] Signal: Segmentation fault (11)
> [barcoo:03306] Signal code: Address not mapped (1)
> [barcoo:03306] Failing at address: 0x2009b4298
> [barcoo:03306] [ 0] /lib64/libpthread.so.0() [0x3f7b60f500]
> [barcoo:03306] [ 1] /usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5(opal_memory_ptmalloc2_int_malloc+0x96a) [0x7f47de6935aa]
> [barcoo:03306] [ 2] /usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5(opal_memory_ptmalloc2_malloc+0x52) [0x7f47de694612]
> [barcoo:03306] [ 3] ./1.7-gnumyhello_f90() [0x400dca]
> [barcoo:03306] [ 4] ./1.7-gnumyhello_f90() [0x40104a]
> [barcoo:03306] [ 5] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3f7b21ecdd]
> [barcoo:03306] [ 6] ./1.7-gnumyhello_f90() [0x400bc9]
> [barcoo:03306] *** End of error message ***
> 
> The backtrace I get from the core file isn't as useful though:
> 
> (gdb) bt full
> #0  0x00007fd9c4c255aa in opal_memory_ptmalloc2_int_malloc () from /usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5
> No symbol table info available.
> #1  0x00007fd9c4c26612 in opal_memory_ptmalloc2_malloc () from /usr/local/openmpi/1.7.3a1r29103/lib/libopen-pal.so.5
> No symbol table info available.
> #2  0x0000000000400dca in main () at gnumyhello_f90.f90:26
>        ierr = 0
>        rank = 0
>        size = 1
>        work = <object is not allocated>
> #3  0x000000000040104a in main ()
> No symbol table info available.
> 
> OMPI 1.7 is built with exactly the same configure options as 1.6
> and the executable is built with -g -O0.
> 
> cheers,
> Chris
> - -- 
> Christopher Samuel        Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/      http://twitter.com/vlsci
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iEYEARECAAYFAlIkNXsACgkQO2KABBYQAh9fhQCdHUrlsl3ftY8VyDNRa8E8jKBx
> BZkAnjJJIXgUzRV8T+VBmrS0MQjXS8zO
> =B7GU
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/