Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Eric Thibodeau (kyron_at_[hidden])
Date: 2006-06-16 09:55:25


Hello everyone,

        Well, first off, I hope this problem I am reporting is of some validity, I tried finding simmilar situations off Google and the mailing list but came up with only one reference [1] which seems invalid in my case since all executions are local (naïve assumptions that it makes a difference on the calling stack). I am trying to run asimple HelloWorld using OpenMPI 1.0.2 on an AMD64 machine and a Sun Enterprise (12 procs) machine. In both cases I get the following error:

pls:rsh: execv failed with errno=2

Here is the mpirun -d trace when running my HelloWorld (on AMD64):

kyron_at_headless ~ $ mpirun -d --prefix /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/ -np 4 ./hello
[headless:10461] procdir: (null)
[headless:10461] jobdir: (null)
[headless:10461] unidir: /tmp/openmpi-sessions-kyron_at_headless_0/default-universe
[headless:10461] top: openmpi-sessions-kyron_at_headless_0
[headless:10461] tmp: /tmp
[headless:10461] [0,0,0] setting up session dir with
[headless:10461] tmpdir /tmp
[headless:10461] universe default-universe-10461
[headless:10461] user kyron
[headless:10461] host headless
[headless:10461] jobid 0
[headless:10461] procid 0
[headless:10461] procdir: /tmp/openmpi-sessions-kyron_at_headless_0/default-universe-10461/0/0
[headless:10461] jobdir: /tmp/openmpi-sessions-kyron_at_headless_0/default-universe-10461/0
[headless:10461] unidir: /tmp/openmpi-sessions-kyron_at_headless_0/default-universe-10461
[headless:10461] top: openmpi-sessions-kyron_at_headless_0
[headless:10461] tmp: /tmp
[headless:10461] [0,0,0] contact_file /tmp/openmpi-sessions-kyron_at_headless_0/default-universe-10461/universe-setup.txt
[headless:10461] [0,0,0] wrote setup file
[headless:10461] spawn: in job_state_callback(jobid = 1, state = 0x1)
[headless:10461] pls:rsh: local csh: 0, local bash: 1
[headless:10461] pls:rsh: assuming same remote shell as local shell
[headless:10461] pls:rsh: remote csh: 0, remote bash: 1
[headless:10461] pls:rsh: final template argv:
[headless:10461] pls:rsh: /usr/bin/ssh <template> orted --debug --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename <template> --universe kyron_at_headless:default-universe-10461 --nsreplica "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" --gprreplica "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" --mpi-call-yield 0
[headless:10461] pls:rsh: launching on node localhost
[headless:10461] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1 (1 4)
[headless:10461] pls:rsh: localhost is a LOCAL node
[headless:10461] pls:rsh: reset PATH: /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/bin:/usr/local/bin:/usr/bin:/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.1:/opt/c3-4:/usr/qt/3/bin:/usr/lib64/openmpi/1.0.2-gcc-4.1/bin
[headless:10461] pls:rsh: reset LD_LIBRARY_PATH: /usr/lib64/openmpi/1.0.2-gcc-4.1/lib64/lib
[headless:10461] pls:rsh: changing to directory /home/kyron
[headless:10461] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe kyron_at_headless:default-universe-10461 --nsreplica "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" --gprreplica "0.0.0;tcp://142.137.135.124:37657;tcp://192.168.1.1:37657" --mpi-call-yield 1
[headless:10461] pls:rsh: execv failed with errno=2
[headless:10461] sess_dir_finalize: proc session dir not empty - leaving
[headless:10461] spawn: in job_state_callback(jobid = 1, state = 0xa)
[headless:10461] sess_dir_finalize: proc session dir not empty - leaving
[headless:10461] sess_dir_finalize: proc session dir not empty - leaving
[headless:10461] sess_dir_finalize: proc session dir not empty - leaving
[headless:10461] spawn: in job_state_callback(jobid = 1, state = 0x9)
[headless:10461] ERROR: A daemon on node localhost failed to start as expected.
[headless:10461] ERROR: There may be more information available from
[headless:10461] ERROR: the remote shell (see above).
[headless:10461] ERROR: The daemon exited unexpectedly with status 255.
[headless:10461] sess_dir_finalize: found proc session dir empty - deleting
[headless:10461] sess_dir_finalize: found job session dir empty - deleting
[headless:10461] sess_dir_finalize: found univ session dir empty - deleting
[headless:10461] sess_dir_finalize: top session dir not empty - leaving

The two platforms are very different, one is AMD64 (dual Opteron) with GCC 4.1.1 (Gentoo), the other is SUN OS 5.8 with GCC 3.4.2. OpenMPI was compiled with the following options (extracted from the config.status):

AMD64:
Open MPI config.status 1.0.2
configured by ./configure, generated by GNU Autoconf 2.59,
  with options \"'--prefix=/usr' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--prefix=/usr/lib64/openmpi/1.0.2-gcc-4.1' '--datadir=/usr/share/openmpi/1.0.2-gcc-4.1' '--program-suffix=-1.0.2-gcc-4.1' '--sysconfdir=/etc/openmpi/1.0.2-gcc-4.1' '--enable-pretty-print-stacktrace' '--libdir=/usr/lib64/openmpi/1.0.2-gcc-4.1/lib64' '--build=x86_64-pc-linux-gnu' '--cache-file' 'config.cache' 'CFLAGS=-march=opteron -O2 -pipe -ftracer -fprefetch-loop-arrays -mfpmath=sse -ffast-math -ftree-vectorize -floop-optimize2' 'CXXFLAGS=-march=opteron -O2 -pipe -ftracer -fprefetch-loop-arrays -mfpmath=sse -ffast-math -ftree-vectorize -floop-optimize2' 'LDFLAGS= -Wl,-z,-noexecstack' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' --enable-ltdl-convenience\"

SUN 5.8:
Open MPI config.status 1.0.2
configured by ./configure, generated by GNU Autoconf 2.59,
  with options \"'--prefix=/export/lca/home/lca0/etudiants/ac38820/openmpi' '--enable-pretty-print-stacktrace' 'CFLAGS=-mv8plus' 'CXXFLAGS=-mv8plus' --enable-ltdl-convenience\"

x86 (as a working reference, configure options should be close to identical as the AMD64):
Open MPI config.status 1.0.2
configured by ./configure, generated by GNU Autoconf 2.59,
  with options \"'--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--prefix=/usr/lib/openmpi/1.0.2-gcc-4.1' '--datadir=/usr/share/openmpi/1.0.2-gcc-4.1' '--program-suffix=-1.0.2-gcc-4.1' '--sysconfdir=/etc/openmpi/1.0.2-gcc-4.1' '--enable-pretty-print-stacktrace' '--build=i686-pc-linux-gnu' '--cache-file' 'config.cache' 'CFLAGS=-march=nocona -O2 -pipe -fomit-frame-pointer' 'CXXFLAGS=-march=nocona -O2 -pipe -fomit-frame-pointer' 'LDFLAGS= -Wl,-z,-noexecstack' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' --enable-ltdl-convenience\"

Any help would be greatly appreciated.

Thanks.

[1] http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=15775

-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517