Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI 1.2.x segfault as regular user
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-03-17 11:01:25


Sorry for the delayed reply.

I'm afraid I haven't done much with SE Linux -- I don't know if there are any "gotchas" that would show up there. SE Linux support is not something we've gotten a lot of request for. I doubt that anyone in the community has done much testing in this area. :-\

I suspect that Open MPI is trying to access something that your user (under SE Linux) doesn't have permission to.

So I'm afraid I don't have much of an answer for you -- sorry! If you do figure it out, though, if a fix is not too intrusive, we can probably incorporate it upstream.

On Mar 4, 2011, at 7:31 AM, Youri LACAN-BARTLEY wrote:

> Hi,
>
> This is my first post to this mailing-list so I apologize for maybe being a little rough on the edges.
> I’ve been digging into OpenMPI for a little while now and have come across one issue that I just can’t explain and I’m sincerely hoping someone can put me on the right track here.
>
> I’m using a fresh install of openmpi-1.2.7 and I systematically get a segmentation fault at the end of my mpirun calls if I’m logged in as a regular user.
> However, as soon as I switch to the root account, the segfault does not appear.
> The jobs actually run to their term but I just can’t find a good reason for this to be happening and I haven’t been able to reproduce the problem on another machine.
>
> Any help or tips would be greatly appreciated.
>
> Thanks,
>
> Youri LACAN-BARTLEY
>
> Here’s an example running osu_latency locally (I’ve “blacklisted” openib to make sure it’s not to blame):
>
> [user_at_server ~]$ mpirun --mca btl ^openib -np 2 /opt/scripts/osu_latency-openmpi-1.2.7
> # OSU MPI Latency Test v3.3
> # Size Latency (us)
> 0 0.76
> 1 0.89
> 2 0.89
> 4 0.89
> 8 0.89
> 16 0.91
> 32 0.91
> 64 0.92
> 128 0.96
> 256 1.13
> 512 1.31
> 1024 1.69
> 2048 2.51
> 4096 5.34
> 8192 9.16
> 16384 17.47
> 32768 31.79
> 65536 51.10
> 131072 92.41
> 262144 181.74
> 524288 512.26
> 1048576 1238.21
> 2097152 2280.28
> 4194304 4616.67
> [server:15586] *** Process received signal ***
> [server:15586] Signal: Segmentation fault (11)
> [server:15586] Signal code: Address not mapped (1)
> [server:15586] Failing at address: (nil)
> [server:15586] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
> [server:15586] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
> [server:15586] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
> [server:15586] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) [0x3cd120fe61]
> [server:15586] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
> [server:15586] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
> [server:15586] *** End of error message ***
> [server:15587] *** Process received signal ***
> [server:15587] Signal: Segmentation fault (11)
> [server:15587] Signal code: Address not mapped (1)
> [server:15587] Failing at address: (nil)
> [server:15587] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
> [server:15587] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
> [server:15587] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
> [server:15587] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) [0x3cd120fe61]
> [server:15587] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
> [server:15587] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
> [server:15587] *** End of error message ***
> mpirun noticed that job rank 0 with PID 15586 on node server exited on signal 11 (Segmentation fault).
> 1 additional process aborted (not shown)
> [server:15583] *** Process received signal ***
> [server:15583] Signal: Segmentation fault (11)
> [server:15583] Signal code: Address not mapped (1)
> [server:15583] Failing at address: (nil)
> [server:15583] [ 0] /lib64/libpthread.so.0 [0x3cd1e0eb10]
> [server:15583] [ 1] /lib64/libc.so.6 [0x3cd166fdc9]
> [server:15583] [ 2] /lib64/libc.so.6(__libc_malloc+0x167) [0x3cd1674dd7]
> [server:15583] [ 3] /lib64/ld-linux-x86-64.so.2(__tls_get_addr+0xb1) [0x3cd120fe61]
> [server:15583] [ 4] /lib64/libselinux.so.1 [0x3cd320f5cc]
> [server:15583] [ 5] /lib64/libselinux.so.1 [0x3cd32045df]
> [server:15583] *** End of error message ***
> Segmentation fault
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/