Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: David Bronke (whitelynx_at_[hidden])
Date: 2007-03-15 19:33:21


I'm using OpenMPI version 1.1.2. I installed it using gentoo portage,
so I think it has the right permissions... I tried doing 'equery f
openmpi | xargs ls -dl' and inspecting the permissions of each file,
and I don't see much out of the ordinary; it is all owned by
root:root, but every file has read permission for user, group, and
other. (and execute for each as well when appropriate) From the debug
output, I can tell that mpirun is creating the session tree in /tmp,
and it does seem to be working fine... Here's the output when using
--debug-daemons:

$ mpirun -aborted 8 -v -d --debug-daemons -np 8 /workspace/bronke/mpi/hello
[trixie:25228] [0,0,0] setting up session dir with
[trixie:25228] universe default-universe
[trixie:25228] user bronke
[trixie:25228] host trixie
[trixie:25228] jobid 0
[trixie:25228] procid 0
[trixie:25228] procdir:
/tmp/openmpi-sessions-bronke_at_trixie_0/default-universe/0/0
[trixie:25228] jobdir: /tmp/openmpi-sessions-bronke_at_trixie_0/default-universe/0
[trixie:25228] unidir: /tmp/openmpi-sessions-bronke_at_trixie_0/default-universe
[trixie:25228] top: openmpi-sessions-bronke_at_trixie_0
[trixie:25228] tmp: /tmp
[trixie:25228] [0,0,0] contact_file
/tmp/openmpi-sessions-bronke_at_trixie_0/default-universe/universe-setup.txt
[trixie:25228] [0,0,0] wrote setup file
[trixie:25228] pls:rsh: local csh: 0, local bash: 1
[trixie:25228] pls:rsh: assuming same remote shell as local shell
[trixie:25228] pls:rsh: remote csh: 0, remote bash: 1
[trixie:25228] pls:rsh: final template argv:
[trixie:25228] pls:rsh: /usr/bin/ssh <template> orted --debug
--debug-daemons --bootproxy 1 --name <template> --num_procs 2
--vpid_start 0 --nodename <template> --universe
bronke_at_trixie:default-universe --nsreplica
"0.0.0;tcp://141.238.31.33:43838" --gprreplica
"0.0.0;tcp://141.238.31.33:43838" --mpi-call-yield 0
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] spawn: in job_state_callback(jobid = 1, state = 0x100)
mpirun noticed that job rank 0 with PID 0 on node "localhost" exited
on signal 13.
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] sess_dir_finalize: proc session dir not empty - leaving
[trixie:25228] spawn: in job_state_callback(jobid = 1, state = 0x80)
mpirun noticed that job rank 0 with PID 0 on node "localhost" exited
on signal 13.
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited
on signal 13.
mpirun noticed that job rank 2 with PID 0 on node "localhost" exited
on signal 13.
mpirun noticed that job rank 3 with PID 0 on node "localhost" exited
on signal 13.
mpirun noticed that job rank 4 with PID 0 on node "localhost" exited
on signal 13.
mpirun noticed that job rank 5 with PID 0 on node "localhost" exited
on signal 13.
mpirun noticed that job rank 6 with PID 0 on node "localhost" exited
on signal 13.
[trixie:25228] ERROR: A daemon on node localhost failed to start as expected.
[trixie:25228] ERROR: There may be more information available from
[trixie:25228] ERROR: the remote shell (see above).
[trixie:25228] The daemon received a signal 13.
1 additional process aborted (not shown)
[trixie:25228] sess_dir_finalize: found proc session dir empty - deleting
[trixie:25228] sess_dir_finalize: found job session dir empty - deleting
[trixie:25228] sess_dir_finalize: found univ session dir empty - deleting
[trixie:25228] sess_dir_finalize: found top session dir empty - deleting

On 3/15/07, Ralph H Castain <rhc_at_[hidden]> wrote:
> It isn't a /dev issue. The problem is likely that the system lacks
> sufficient permissions to either:
>
> 1. create the Open MPI session directory tree. We create a hierarchy of
> subdirectories for temporary storage used for things like your shared memory
> file - the location of the head of that tree can be specified at run time,
> but has a series of built-in defaults it can search if you don't specify it
> (we look at your environmental variables - e.g., TMP or TMPDIR - as well as
> the typical Linux/Unix places). You might check to see what your tmp
> directory is, and that you have write permission into it. Alternatively, you
> can specify your own location (where you know you have permissions!) by
> setting --tmpdir your-dir on the mpirun command line.
>
> 2. execute or access the various binaries and/or libraries. This is usually
> caused when someone installs OpenMPI as root, and then tries to execute as a
> non-root user. Best thing here is to either run through the installation
> directory and add the correct permissions (assuming it is a system-level
> install), or reinstall as the non-root user (if the install is solely for
> you anyway).
>
> You can also set --debug-daemons on the mpirun command line to get more
> diagnostic output from the daemons and then send that along.
>
> BTW: if possible, it helps us to advise you if we know which version of
> OpenMPI you are using. ;-)
>
> Hope that helps.
> Ralph
>
>
>
>
> On 3/15/07 1:51 PM, "David Bronke" <whitelynx_at_[hidden]> wrote:
>
> > Ok, now that I've figured out what the signal means, I'm wondering
> > exactly what is running into permission problems... the program I'm
> > running doesn't use any functions except printf, sprintf, and MPI_*...
> > I was thinking that possibly changes to permissions on certain /dev
> > entries in newer distros might cause this, but I'm not even sure what
> > /dev entries would be used by MPI.
> >
> > On 3/15/07, McCalla, Mac <macmccalla_at_[hidden]> wrote:
> >> Hi,
> >> If the perror command is available on your system it will tell
> >> you what the message is associated with the signal value. On my system
> >> RHEL4U3, it is permission denied.
> >>
> >> HTH,
> >>
> >> mac mccalla
> >>
> >> -----Original Message-----
> >> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> >> Behalf Of David Bronke
> >> Sent: Thursday, March 15, 2007 12:25 PM
> >> To: users_at_[hidden]
> >> Subject: [OMPI users] Signal 13
> >>
> >> I've been trying to get OpenMPI working on two of the computers at a lab
> >> I help administer, and I'm running into a rather large issue. When
> >> running anything using mpirun as a normal user, I get the following
> >> output:
> >>
> >>
> >> $ mpirun --no-daemonize --host
> >> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
> >> calhost
> >> /workspace/bronke/mpi/hello
> >> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on
> >> signal 13.
> >> [trixie:18104] ERROR: A daemon on node localhost failed to start as
> >> expected.
> >> [trixie:18104] ERROR: There may be more information available from
> >> [trixie:18104] ERROR: the remote shell (see above).
> >> [trixie:18104] The daemon received a signal 13.
> >> 8 additional processes aborted (not shown)
> >>
> >>
> >> However, running the same exact command line as root works fine:
> >>
> >>
> >> $ sudo mpirun --no-daemonize --host
> >> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
> >> calhost
> >> /workspace/bronke/mpi/hello
> >> Password:
> >> p is 8, my_rank is 0
> >> p is 8, my_rank is 1
> >> p is 8, my_rank is 2
> >> p is 8, my_rank is 3
> >> p is 8, my_rank is 6
> >> p is 8, my_rank is 7
> >> Greetings from process 1!
> >>
> >> Greetings from process 2!
> >>
> >> Greetings from process 3!
> >>
> >> p is 8, my_rank is 5
> >> p is 8, my_rank is 4
> >> Greetings from process 4!
> >>
> >> Greetings from process 5!
> >>
> >> Greetings from process 6!
> >>
> >> Greetings from process 7!
> >>
> >>
> >> I've looked up signal 13, and have found that it is apparently SIGPIPE;
> >> I also found a thread on the LAM-MPI site:
> >> http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
> >> However, this thread seems to indicate that the problem would be in the
> >> application, (/workspace/bronke/mpi/hello in this case) but there are no
> >> pipes in use in this app, and the fact that it works as expected as root
> >> doesn't seem to fit either. I have tried running mpirun with --verbose
> >> and it doesn't show any more output than without it, so I've run into a
> >> sort of dead-end on this issue. Does anyone know of any way I can figure
> >> out what's going wrong or how I can fix it?
> >>
> >> Thanks!
> >> --
> >> David H. Bronke
> >> Lead Programmer
> >> G33X Nexus Entertainment
> >> http://games.g33xnexus.com/precursors/
> >>
> >> v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7
> >> p6
> >> hackerkey.com
> >> Support Web Standards! http://www.webstandards.org/
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/
v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/