Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Clement Chu (clement.chu_at_[hidden])
Date: 2005-11-10 10:01:29


[clement_at_kfc TestMPI]$ mpirun -d -np 2 test
[kfc:29199] procdir: (null)
[kfc:29199] jobdir: (null)
[kfc:29199] unidir: /tmp/openmpi-sessions-clement_at_kfc_0/default-universe
[kfc:29199] top: openmpi-sessions-clement_at_kfc_0
[kfc:29199] tmp: /tmp
[kfc:29199] [0,0,0] setting up session dir with
[kfc:29199] tmpdir /tmp
[kfc:29199] universe default-universe-29199
[kfc:29199] user clement
[kfc:29199] host kfc
[kfc:29199] jobid 0
[kfc:29199] procid 0
[kfc:29199] procdir:
/tmp/openmpi-sessions-clement_at_kfc_0/default-universe-29199/0/0
[kfc:29199] jobdir:
/tmp/openmpi-sessions-clement_at_kfc_0/default-universe-29199/0
[kfc:29199] unidir:
/tmp/openmpi-sessions-clement_at_kfc_0/default-universe-29199
[kfc:29199] top: openmpi-sessions-clement_at_kfc_0
[kfc:29199] tmp: /tmp
[kfc:29199] [0,0,0] contact_file
/tmp/openmpi-sessions-clement_at_kfc_0/default-universe-29199/universe-setup.txt
[kfc:29199] [0,0,0] wrote setup file
[kfc:29199] pls:rsh: local csh: 0, local bash: 1
[kfc:29199] pls:rsh: assuming same remote shell as local shell
[kfc:29199] pls:rsh: remote csh: 0, remote bash: 1
[kfc:29199] pls:rsh: final template argv:
[kfc:29199] pls:rsh: ssh <template> orted --debug --bootproxy 1
--name <template> --num_procs 2 --vpid_start 0 --nodename <template>
--universe clement_at_kfc:default-universe-29199 --nsreplica
"0.0.0;tcp://192.168.11.101:32784" --gprreplica
"0.0.0;tcp://192.168.11.101:32784" --mpi-call-yield 0
[kfc:29199] pls:rsh: launching on node localhost
[kfc:29199] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1
(1 2)
[kfc:29199] sess_dir_finalize: proc session dir not empty - leaving
[kfc:29199] spawn: in job_state_callback(jobid = 1, state = 0xa)
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited on
signal 11.
[kfc:29199] sess_dir_finalize: proc session dir not empty - leaving
[kfc:29199] spawn: in job_state_callback(jobid = 1, state = 0x9)
[kfc:29199] ERROR: A daemon on node localhost failed to start as expected.
[kfc:29199] ERROR: There may be more information available from
[kfc:29199] ERROR: the remote shell (see above).
[kfc:29199] The daemon received a signal 11.
1 additional process aborted (not shown)
[kfc:29199] sess_dir_finalize: found proc session dir empty - deleting
[kfc:29199] sess_dir_finalize: found job session dir empty - deleting
[kfc:29199] sess_dir_finalize: found univ session dir empty - deleting
[kfc:29199] sess_dir_finalize: top session dir not empty - leaving

opmi_info output message:

[clement_at_kfc TestMPI]$ ompi_info
                Open MPI: 1.0rc5r8053
   Open MPI SVN revision: r8053
                Open RTE: 1.0rc5r8053
   Open RTE SVN revision: r8053
                    OPAL: 1.0rc5r8053
       OPAL SVN revision: r8053
                  Prefix: /home/clement/openmpi
 Configured architecture: i686-pc-linux-gnu
           Configured by: clement
           Configured on: Fri Nov 11 00:37:23 EST 2005
          Configure host: kfc
                Built by: clement
                Built on: Fri Nov 11 00:59:26 EST 2005
              Built host: kfc
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: 1
              MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.0)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.0)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0)
                 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0)
                 MCA pml: uniq (MCA v1.0, API v1.0, Component v1.0)
                 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0)
                 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0)
                 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)
                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0)
                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0)
                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)
               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)
                 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0)
[clement_at_kfc TestMPI]$

Jeff Squyres wrote:

>I'm sorry -- I wasn't entirely clear:
>
>1. Are you using a 1.0 nightly tarball or a 1.1 nightly tarball? We
>have made a bunch of fixes to the 1.1 tree (i.e., the Subversion
>trunk), but have not fully vetted them yet, so they have not yet been
>taken to the 1.0 release branch yet. If you have not done so already,
>could you try a tarball from the trunk?
>http://www.open-mpi.org/nightly/trunk/
>
>2. The error you are seeing looks like a proxy process is failing to
>start because it seg faults. Are you getting corefiles? If so, can
>you send the backtrace? The corefile should be from the
>$prefix/bin/orted executable.
>
>3. Failing that, can you run with the "-d" switch? It should give a
>bunch of debugging output that might be helpful. "mpirun -d -np 2
>./test", for example.
>
>4. Also please send the output of the "ompi_info" command.
>
>
>On Nov 10, 2005, at 9:05 AM, Clement Chu wrote:
>
>
>
>>I have tried the latest version (rc5 8053), but the error is still
>>here.
>>
>>Jeff Squyres wrote:
>>
>>
>>
>>>We've actually made quite a few bug fixes since RC4 (RC5 is not
>>>available yet). Would you mind trying with a nightly snapshot
>>>tarball?
>>>
>>>(there were some SVN commits last night after the nightly snapshot was
>>>made; I've just initiated another snapshot build -- r8085 should be on
>>>the web site within an hour or so)
>>>
>>>
>>>On Nov 10, 2005, at 4:38 AM, Clement Chu wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>> I got an error when tried the mpirun on mpi program. The following
>>>>is
>>>>the error message:
>>>>
>>>>[clement_at_kfc TestMPI]$ mpicc -g -o test main.c
>>>>[clement_at_kfc TestMPI]$ mpirun -np 2 test
>>>>mpirun noticed that job rank 1 with PID 0 on node "localhost" exited
>>>>on
>>>>signal 11.
>>>>[kfc:28466] ERROR: A daemon on node localhost failed to start as
>>>>expected.
>>>>[kfc:28466] ERROR: There may be more information available from
>>>>[kfc:28466] ERROR: the remote shell (see above).
>>>>[kfc:28466] The daemon received a signal 11.
>>>>1 additional process aborted (not shown)
>>>>[clement_at_kfc TestMPI]$
>>>>
>>>>I am using openmpi-1.0rc4 and running on Linux Redhat Fedora Core 4.
>>>>The kernal is 2.6.12-1.1456_FC4. My building procedure is as below:
>>>>1. ./configure --prefix=/home/clement/openmpi --with-devel-headers
>>>>2. make all install
>>>>3. login root. add openmpi's path and lib in /etc/bashrc
>>>>4. see the $PATH and $LD_LIBRARY_PATH as below
>>>>[clement_at_kfc TestMPI]$ echo $PATH
>>>>/usr/java/jdk1.5.0_05/bin:/home/clement/openmpi/bin:/usr/java/
>>>>jdk1.5.0_05/bin:/home/clement/mpich-1.2.7/bin:/usr/kerberos/bin:/usr/
>>>>local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/clement/bin
>>>>[clement_at_kfc TestMPI]$ echo $LD_LIBRARY_PATH
>>>>/home/clement/openmpi/lib
>>>>[clement_at_kfc TestMPI]$
>>>>
>>>>5. go to mpi program's directory
>>>>6. mpicc -g -o test main.c
>>>>7. mpirun -np 2 test
>>>>
>>>>Any idea of this problem. Many thanks.
>>>>
>>>>
>
>
>

-- 
Clement Kam Man Chu
Research Assistant
School of Computer Science & Software Engineering
Monash University, Caulfield Campus
Ph: 61 3 9903 1964