Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-11-10 13:23:07


The name of the launcher is "rsh", but it actually defaults to trying
to fork/exec ssh.

Unfortunately, your backtrace doesn't tell much because there are no
debugging symbols. Can you recompile OMPI with debugging enabled and
send a new backtrace? Use:

        ./configure CFLAGS=-g ....

On Nov 10, 2005, at 10:53 AM, Clement Chu wrote:

> there is the backtrace result: (now i am using 8085)
> Does mpirun start rsh?? I think I need ssh instead of rsh.
>
> [clement_at_kfc tmp]$ gdb mpirun core.17766
> GNU gdb Red Hat Linux (6.3.0.0-1.21rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "i386-redhat-linux-gnu"...Using host
> libthread_db library "/lib/libthread_db.so.1".
>
> Reading symbols from shared object read from target memory...done.
> Loaded system supplied DSO at 0xa2d000
> Core was generated by `mpirun -np 2 test'.
> Program terminated with signal 11, Segmentation fault.
>
> warning: svr4_current_sos: Can't read pathname for load map:
> Input/output error
>
> Reading symbols from /home/clement/openmpi/lib/liborte.so.0...done.
> Loaded symbols for /home/clement/openmpi/lib/liborte.so.0
> Reading symbols from /home/clement/openmpi/lib/libopal.so.0...done.
> Loaded symbols for /home/clement/openmpi/lib/libopal.so.0
> Reading symbols from /lib/libdl.so.2...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib/libm.so.6...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib/libutil.so.1...done.
> Loaded symbols for /lib/libutil.so.1
> Reading symbols from /lib/libnsl.so.1...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/libpthread.so.0...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_paffinity_linux.so...done.
> Loaded symbols for
> /home/clement/openmpi/lib/openmpi/mca_paffinity_linux.so
> Reading symbols from /lib/libnss_files.so.2...done.
> Loaded symbols for /lib/libnss_files.so.2
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_ns_proxy.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_ns_proxy.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_ns_replica.so...done.Loaded
> symbols for /home/clement/openmpi/lib/openmpi/mca_ns_replica.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_rml_oob.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_rml_oob.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_oob_tcp.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_oob_tcp.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_gpr_null.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_gpr_null.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_gpr_proxy.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_gpr_proxy.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_gpr_replica.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_gpr_replica.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_rmgr_proxy.so...done.Loaded
> symbols for /home/clement/openmpi/lib/openmpi/mca_rmgr_proxy.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_rmgr_urm.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_rmgr_urm.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_rds_hostfile.so...done.
> Loaded symbols for
> /home/clement/openmpi/lib/openmpi/mca_rds_hostfile.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_rds_resfile.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_rds_resfile.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_ras_dash_host.so...done.
> Loaded symbols for
> /home/clement/openmpi/lib/openmpi/mca_ras_dash_host.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_ras_hostfile.so...done.
> Loaded symbols for
> /home/clement/openmpi/lib/openmpi/mca_ras_hostfile.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_ras_localhost.so...done.
> Loaded symbols for
> /home/clement/openmpi/lib/openmpi/mca_ras_localhost.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_ras_slurm.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_ras_slurm.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_rmaps_round_robin.so...done.
> Loaded symbols for
> /home/clement/openmpi/lib/openmpi/mca_rmaps_round_robin.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_pls_fork.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_pls_fork.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_pls_proxy.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_pls_proxy.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_pls_rsh.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_pls_rsh.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_pls_slurm.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_pls_slurm.so
> Reading symbols from
> /home/clement/openmpi/lib/openmpi/mca_iof_svc.so...done.
> Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_iof_svc.so
> #0 0x00e2a075 in orte_pls_rsh_launch ()
> from /home/clement/openmpi/lib/openmpi/mca_pls_rsh.so
> (gdb) where
> #0 0x00e2a075 in orte_pls_rsh_launch ()
> from /home/clement/openmpi/lib/openmpi/mca_pls_rsh.so
> #1 0x0042b656 in orte_rmgr_urm_spawn ()
> from /home/clement/openmpi/lib/openmpi/mca_rmgr_urm.so
> #2 0x0804a10c in orterun (argc=4, argv=0xbf983d54) at orterun.c:373
> #3 0x08049b4e in main (argc=4, argv=0xbf983d54) at main.c:13
> (gdb)
>
>
>
>
>
> Jeff Squyres wrote:
>
>> I'm sorry -- I wasn't entirely clear:
>>
>> 1. Are you using a 1.0 nightly tarball or a 1.1 nightly tarball? We
>> have made a bunch of fixes to the 1.1 tree (i.e., the Subversion
>> trunk), but have not fully vetted them yet, so they have not yet been
>> taken to the 1.0 release branch yet. If you have not done so already,
>> could you try a tarball from the trunk?
>> http://www.open-mpi.org/nightly/trunk/
>>
>> 2. The error you are seeing looks like a proxy process is failing to
>> start because it seg faults. Are you getting corefiles? If so, can
>> you send the backtrace? The corefile should be from the
>> $prefix/bin/orted executable.
>>
>> 3. Failing that, can you run with the "-d" switch? It should give a
>> bunch of debugging output that might be helpful. "mpirun -d -np 2
>> ./test", for example.
>>
>> 4. Also please send the output of the "ompi_info" command.
>>
>>
>> On Nov 10, 2005, at 9:05 AM, Clement Chu wrote:
>>
>>
>>
>>> I have tried the latest version (rc5 8053), but the error is still
>>> here.
>>>
>>> Jeff Squyres wrote:
>>>
>>>
>>>
>>>> We've actually made quite a few bug fixes since RC4 (RC5 is not
>>>> available yet). Would you mind trying with a nightly snapshot
>>>> tarball?
>>>>
>>>> (there were some SVN commits last night after the nightly snapshot
>>>> was
>>>> made; I've just initiated another snapshot build -- r8085 should be
>>>> on
>>>> the web site within an hour or so)
>>>>
>>>>
>>>> On Nov 10, 2005, at 4:38 AM, Clement Chu wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I got an error when tried the mpirun on mpi program. The following
>>>>> is
>>>>> the error message:
>>>>>
>>>>> [clement_at_kfc TestMPI]$ mpicc -g -o test main.c
>>>>> [clement_at_kfc TestMPI]$ mpirun -np 2 test
>>>>> mpirun noticed that job rank 1 with PID 0 on node "localhost"
>>>>> exited
>>>>> on
>>>>> signal 11.
>>>>> [kfc:28466] ERROR: A daemon on node localhost failed to start as
>>>>> expected.
>>>>> [kfc:28466] ERROR: There may be more information available from
>>>>> [kfc:28466] ERROR: the remote shell (see above).
>>>>> [kfc:28466] The daemon received a signal 11.
>>>>> 1 additional process aborted (not shown)
>>>>> [clement_at_kfc TestMPI]$
>>>>>
>>>>> I am using openmpi-1.0rc4 and running on Linux Redhat Fedora Core
>>>>> 4.
>>>>> The kernal is 2.6.12-1.1456_FC4. My building procedure is as
>>>>> below:
>>>>> 1. ./configure --prefix=/home/clement/openmpi --with-devel-headers
>>>>> 2. make all install
>>>>> 3. login root. add openmpi's path and lib in /etc/bashrc
>>>>> 4. see the $PATH and $LD_LIBRARY_PATH as below
>>>>> [clement_at_kfc TestMPI]$ echo $PATH
>>>>> /usr/java/jdk1.5.0_05/bin:/home/clement/openmpi/bin:/usr/java/
>>>>> jdk1.5.0_05/bin:/home/clement/mpich-1.2.7/bin:/usr/kerberos/bin:/
>>>>> usr/
>>>>> local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/clement/bin
>>>>> [clement_at_kfc TestMPI]$ echo $LD_LIBRARY_PATH
>>>>> /home/clement/openmpi/lib
>>>>> [clement_at_kfc TestMPI]$
>>>>>
>>>>> 5. go to mpi program's directory
>>>>> 6. mpicc -g -o test main.c
>>>>> 7. mpirun -np 2 test
>>>>>
>>>>> Any idea of this problem. Many thanks.
>>>>>
>>>>>
>>
>>
>>
>
>
> --
> Clement Kam Man Chu
> Research Assistant
> School of Computer Science & Software Engineering
> Monash University, Caulfield Campus
> Ph: 61 3 9903 1964
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/