Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
From: Christopher Samuel (samuel_at_[hidden])
Date: 2011-11-14 21:08:53


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 14/11/11 21:27, Y.MATSUMOTO wrote:

> I'm a member of MPI library development team in Fujitsu,
> Takahiro Kawashima, who sent mail before, is my colleague.
> We start to feed back.

First of all I'd like to say congratulations on breaking
10PF, and also a big thanks for working on contributing
changes back to Open-MPI!

Whilst I can't comment on the fix I can confirm that I also
see segfaults with Open-MPI 1.4.2 and 1.4.4 with your example
program.

Intel compilers 11.1:

- --------------------------------------------------------------------------
[bruce002:03973] *** Process received signal ***
[bruce002:03973] Signal: Segmentation fault (11)
[bruce002:03973] Signal code: Address not mapped (1)
[bruce002:03973] Failing at address: 0x100000009
[bruce002:03973] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10]
[bruce002:03973] [ 1] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0 [0x2aaaaab5d79d]
[bruce002:03973] [ 2] /usr/local/openmpi/1.4.4-intel/lib/libopen-pal.so.0(opal_progress+0x87) [0x2aaaab1fdc27]
[bruce002:03973] [ 3] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0 [0x2aaaaabce252]
[bruce002:03973] [ 4] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0(PMPI_Recv+0x213) [0x2aaaaab1e0f3]
[bruce002:03973] [ 5] ./tp_lb_ub_ng(main+0x29b) [0x4021ab]
[bruce002:03973] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994]
[bruce002:03973] [ 7] ./tp_lb_ub_ng [0x401e59]
[bruce002:03973] *** End of error message ***
- --------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 3973 on node bruce002 exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------
[bruce002:03972] *** Process received signal ***
[bruce002:03972] Signal: Segmentation fault (11)
[bruce002:03972] Signal code: Address not mapped (1)
[bruce002:03972] Failing at address: 0xffffffffff84bad0
[bruce002:03972] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10]
[bruce002:03972] [ 1] ./tp_lb_ub_ng(__intel_new_memcpy+0x2c) [0x403c9c]
[bruce002:03972] *** End of error message ***

GCC 4.4.4:

- --------------------------------------------------------------------------
[bruce002:04049] *** Process received signal ***
[bruce002:04049] Signal: Segmentation fault (11)
[bruce002:04049] Signal code: Address not mapped (1)
[bruce002:04049] Failing at address: 0x100000009
[bruce002:04049] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10]
[bruce002:04049] [ 1] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaab51f27]
[bruce002:04049] [ 2] /usr/local/openmpi/1.4.4-gcc/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2aaaab14bb3a]
[bruce002:04049] [ 3] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabb9985]
[bruce002:04049] [ 4] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0(PMPI_Recv+0x12f) [0x2aaaaab1913f]
[bruce002:04049] [ 5] ./tp_lb_ub_ng(main+0x21c) [0x400dd0]
[bruce002:04049] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994]
[bruce002:04049] [ 7] ./tp_lb_ub_ng [0x400af9]
[bruce002:04049] *** End of error message ***
- --------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 4049 on node bruce002 exited on signal 11 (Segmentation fault).
- --------------------------------------------------------------------------
[bruce002:04048] *** Process received signal ***
[bruce002:04048] Signal: Segmentation fault (11)
[bruce002:04048] Signal code: Address not mapped (1)
[bruce002:04048] Failing at address: 0x2aaab0833000
[bruce002:04048] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10]
[bruce002:04048] [ 1] /lib64/libc.so.6(memcpy+0x3ff) [0x3e12a7c63f]
[bruce002:04048] [ 2] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaaafef7b]
[bruce002:04048] [ 3] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaab4fcdd]
[bruce002:04048] [ 4] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabc1563]
[bruce002:04048] [ 5] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabbce78]
[bruce002:04048] [ 6] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaab52036]
[bruce002:04048] [ 7] /usr/local/openmpi/1.4.4-gcc/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2aaaab14bb3a]
[bruce002:04048] [ 8] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aaaaabba5f5]
[bruce002:04048] [ 9] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0(MPI_Send+0x177) [0x2aaaaab1b1d7]
[bruce002:04048] [10] ./tp_lb_ub_ng(main+0x1e4) [0x400d98]
[bruce002:04048] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994]
[bruce002:04048] [12] ./tp_lb_ub_ng [0x400af9]
[bruce002:04048] *** End of error message ***

- --
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7BybUACgkQO2KABBYQAh9/mwCdEx6FrXaahHRlfIlKX+GqvScO
+tcAn0ieXCjxG5JrOvkgSy0YQ9EgA7S8
=nUtx
-----END PGP SIGNATURE-----