Thanks for your feedback and advice.
SELinux is currently disabled at runtime on all nodes as well as on the head node.
So I don't believe this might be the issue here.
I have indeed compiled Open MPI myself and haven't specified anything peculiar other than a --prefix and --enable-mpirun-prefix-by-default.
Have I overlooked something?
The problem doesn't occur with Open MPI 1.4.
I've tried running simple jobs directly on the head node to eliminate any networking or IB wizardry and mpirun systematically segfaults as a non-root user.
Here's one part of a strace call on mpirun that might be of some significance:
mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
For further information you can refer to the strace files attached to this email.
De : users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] De la part de Prentice Bisbal
Envoyé : lundi 21 mars 2011 14:56
À : Open MPI Users
Objet : Re: [OMPI users] OpenMPI 1.2.x segfault as regular user
On 03/20/2011 06:22 PM, Kevin.Buckley_at_[hidden] wrote:
>> It's not hard to test whether or not SELinux is the problem. You can
>> turn SELinux off on the command-line with this command:
>> setenforce 0
>> Of course, you need to be root in order to do this.
>> After turning SELinux off, you can try reproducing the error. If it
>> still occurs, it's SELinux, if it doesn't the problem is elswhere. When
>> your done, you can reenable SELinux with
>> setenforce 1
>> If you're running your job across multiple nodes, you should disable
>> SELinux on all of them for testing.
> You are not actually disabling SELinux with setenforce 0, just
> putting it into "permissive" mode: SELinux is still active.
That's correct. Thanks for catching my inaccurate choice of words.
> Running SELinux in its permissive mode, as opposed to disabling it
> at boot time, sees SELinux making a log of things that would cause
> it to dive in, were it running in "enforcing" mode.
I forgot about that. Checking those logs will make debugging even easier
for the original poster.
> There's then a tool you can run over that log that will suggest
> the ACL changes you need to make to fix the issue from an SELinux
users mailing list