Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Mike Houston (mhouston_at_[hidden])
Date: 2007-03-15 13:36:29


I've been having similar issues with brand new FC5/6 and RHEL5 machines,
but our FC4/RHEL4 machines are just fine. On the FC5/6 RHEL5 machines,
I can get things to run as root. There must be some ACL or security
setting issue that's enabled by default on the newer distros. If I
figure it out this weekend, I'll let you know. If anyone else knows the
solution, please post to the list.

-Mike

David Bronke wrote:
> I've been trying to get OpenMPI working on two of the computers at a
> lab I help administer, and I'm running into a rather large issue. When
> running anything using mpirun as a normal user, I get the following
> output:
>
>
> $ mpirun --no-daemonize --host
> localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
> /workspace/bronke/mpi/hello
> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited
> on signal 13.
> [trixie:18104] ERROR: A daemon on node localhost failed to start as expected.
> [trixie:18104] ERROR: There may be more information available from
> [trixie:18104] ERROR: the remote shell (see above).
> [trixie:18104] The daemon received a signal 13.
> 8 additional processes aborted (not shown)
>
>
> However, running the same exact command line as root works fine:
>
>
> $ sudo mpirun --no-daemonize --host
> localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
> /workspace/bronke/mpi/hello
> Password:
> p is 8, my_rank is 0
> p is 8, my_rank is 1
> p is 8, my_rank is 2
> p is 8, my_rank is 3
> p is 8, my_rank is 6
> p is 8, my_rank is 7
> Greetings from process 1!
>
> Greetings from process 2!
>
> Greetings from process 3!
>
> p is 8, my_rank is 5
> p is 8, my_rank is 4
> Greetings from process 4!
>
> Greetings from process 5!
>
> Greetings from process 6!
>
> Greetings from process 7!
>
>
> I've looked up signal 13, and have found that it is apparently
> SIGPIPE; I also found a thread on the LAM-MPI site:
> http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
> However, this thread seems to indicate that the problem would be in
> the application, (/workspace/bronke/mpi/hello in this case) but there
> are no pipes in use in this app, and the fact that it works as
> expected as root doesn't seem to fit either. I have tried running
> mpirun with --verbose and it doesn't show any more output than without
> it, so I've run into a sort of dead-end on this issue. Does anyone
> know of any way I can figure out what's going wrong or how I can fix
> it?
>
> Thanks!
>