Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI 1.2.x segfault as regular user
From: Youri LACAN-BARTLEY (youri.lacan-bartley_at_[hidden])
Date: 2011-03-04 07:31:04


Hi,

 

This is my first post to this mailing-list so I apologize for maybe
being a little rough on the edges.

I've been digging into OpenMPI for a little while now and have come
across one issue that I just can't explain and I'm sincerely hoping
someone can put me on the right track here.

 

I'm using a fresh install of openmpi-1.2.7 and I systematically get a
segmentation fault at the end of my mpirun calls if I'm logged in as a
regular user.

However, as soon as I switch to the root account, the segfault does not
appear.

The jobs actually run to their term but I just can't find a good reason
for this to be happening and I haven't been able to reproduce the
problem on another machine.

 

Any help or tips would be greatly appreciated.

 

Thanks,

 

Youri LACAN-BARTLEY

 

Here's an example running osu_latency locally (I've "blacklisted" openib
to make sure it's not to blame):

 

[user_at_server ~]$ mpirun --mca btl ^openib -np 2
/opt/scripts/osu_latency-openmpi-1.2.7

# OSU MPI Latency Test v3.3

# Size Latency (us)

0 0.76

1 0.89

2 0.89

4 0.89

8 0.89

16 0.91

32 0.91

64 0.92

128 0.96

256 1.13

512 1.31

1024 1.69

2048 2.51

4096 5.34

8192 9.16

16384 17.47

32768 31.79

65536 51.10

131072 92.41

262144 181.74

524288 512.26

1048576 1238.21

2097152 2280.28

4194304 4616.67

[server:15586] *** Process received signal ***

[server:15586] Signal: Segmentation fault (11)

[server:15586] Signal code: Address not mapped (1)

[server:15586] Failing at address: (nil)

[server:15586] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]

[server:15586] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]

[server:15586] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]

[server:15586] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1)
[0x3cd120fe61]

[server:15586] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]

[server:15586] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]

[server:15586] *** End of error message ***

[server:15587] *** Process received signal ***

[server:15587] Signal: Segmentation fault (11)

[server:15587] Signal code: Address not mapped (1)

[server:15587] Failing at address: (nil)

[server:15587] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]

[server:15587] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]

[server:15587] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]

[server:15587] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1)
[0x3cd120fe61]

[server:15587] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]

[server:15587] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]

[server:15587] *** End of error message ***

mpirun noticed that job rank 0 with PID 15586 on node server exited on
signal 11 (Segmentation fault).

1 additional process aborted (not shown)

[server:15583] *** Process received signal ***

[server:15583] Signal: Segmentation fault (11)

[server:15583] Signal code: Address not mapped (1)

[server:15583] Failing at address: (nil)

[server:15583] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]

[server:15583] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]

[server:15583] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]

[server:15583] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1)
[0x3cd120fe61]

[server:15583] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]

[server:15583] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]

[server:15583] *** End of error message ***

Segmentation fault