Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Greg Watson (gwatson_at_[hidden])
Date: 2005-12-18 12:11:37


Sure seems like it:

(gdb) p *mca_pls_rsh_component.argv_at_4
$12 = {0x90e0428 "ssh", 0x90e0438 "-x", 0x0, 0x11 <Address 0x11 out
of bounds>}
(gdb) p mca_pls_rsh_component.argc
$13 = 2
(gdb) p local_exec_index
$14 = 3

Greg

On Dec 18, 2005, at 4:56 AM, Rainer Keller wrote:

> Hello Greg,
> I don't know, whether it's segfaulting at that particular line, but
> could You
> please print the argv, since I guess, that might be the
> local_exec_index
> into the argv being wrong?
>
> Thanks,
> Rainer
>
> On Saturday 17 December 2005 19:16, Greg Watson wrote:
>> Here's the stacktrace:
>>
>> #0 0x00ae1fe8 in orte_pls_rsh_launch (jobid=1) at
>> pls_rsh_module.c:714
>> 714 if (mca_pls_rsh_component.debug) {
>> (gdb) where
>> #0 0x00ae1fe8 in orte_pls_rsh_launch (jobid=1) at
>> pls_rsh_module.c:714
>> #1 0x00a29642 in orte_rmgr_urm_spawn ()
>> from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so
>> #2 0x0804a0d4 in orterun (argc=4, argv=0xbff88594) at orterun.c:373
>> #3 0x08049b16 in main (argc=4, argv=0xbff88594) at main.c:13
>>
>> And the contents of mca_pls_rsh_component:
>>
>> (gdb) p mca_pls_rsh_component
>> $2 = {super = {pls_version = {mca_major_version = 1,
>> mca_minor_version = 0,
>> mca_release_version = 0, mca_type_name = "pls", '\0' <repeats
>> 28 times>,
>> mca_type_major_version = 1, mca_type_minor_version = 0,
>> mca_type_release_version = 0,
>> mca_component_name = "rsh", '\0' <repeats 60 times>,
>> mca_component_major_version = 1,
>> mca_component_minor_version = 0,
>> mca_component_release_version = 1,
>> mca_open_component = 0xae0a80 <orte_pls_rsh_component_open>,
>> mca_close_component = 0xae09a0
>> <orte_pls_rsh_component_close>},
>> pls_data = {mca_is_checkpointable = true},
>> pls_init = 0xae093c <orte_pls_rsh_component_init>}, debug =
>> false,
>> reap = true, assume_same_shell = true, delay = 1, priority = 10,
>> argv = 0x90e0418, argc = 2, orted = 0x90de438 "orted",
>> path = 0x90e0960 "/usr/bin/ssh", num_children = 0, num_concurrent
>> = 128,
>> lock = {super = {obj_class = 0x804ec38, obj_reference_count = 1},
>> m_lock_pthread = {__data = {__lock = 0, __count = 0, __owner
>> = 0,
>> __kind = 0, __nusers = 0, __spins = 0},
>> __size = '\0' <repeats 23 times>, __align = 0}, m_lock_atomic
>> = {u = {
>> lock = 0, sparc_lock = 0 '\0', padding = "\000\000\000"}}},
>> cond = {
>> super = {obj_class = 0x804ec18, obj_reference_count = 1},
>> c_waiting = 0,
>> c_signaled = 0, c_cond = {__data = {__lock = 0, __futex = 0,
>> __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex
>> = 0x0,
>> __nwaiters = 0, __broadcast_seq = 0},
>> __size = '\0' <repeats 47 times>, __align = 0}}}
>>
>> I can't see why it is segfaulting at this particular line.
>>
>> Greg
>>
>> On Dec 16, 2005, at 5:55 PM, Jeff Squyres wrote:
>>> On Dec 16, 2005, at 10:47 AM, Greg Watson wrote:
>>>> I finally worked out why I couldn't reproduce the problem.
>>>> You're not
>>>> going to like it though.
>>>
>>> You're right -- this kind of buglet is among the most un-fun. :-(
>>>
>>>> Here's the stacktracefrom the core file:
>>>>
>>>> #0 0x00e93fe8 in orte_pls_rsh_launch ()
>>>> from /usr/local/ompi/lib/openmpi/mca_pls_rsh.so
>>>> #1 0x0023c642 in orte_rmgr_urm_spawn ()
>>>> from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so
>>>> #2 0x0804a0d4 in orterun (argc=5, argv=0xbfab2e84) at orterun.c:
>>>> 373
>>>> #3 0x08049b16 in main (argc=5, argv=0xbfab2e84) at main.c:13
>>>
>>> Can you recompile this one file with -g? Specifically, cd into the
>>> orte/mca/pla/rsh dir and "make clean". Then "make". Then cut-n-
>>> paste the compile line for that one file to a shell prompt, and put
>>> in a -g.
>>>
>>> Then either re-install that component (it looks like you're doing a
>>> dynamic build with separate components, so you can do "make install"
>>> right from the rsh dir) or re-link liborte and re-install that
>>> and re-
>>> run. The corefile might give something a little more meaningful in
>>> this case...?
>>>
>>> --
>>> {+} Jeff Squyres
>>> {+} The Open MPI Project
>>> {+} http://www.open-mpi.org/
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> ---------------------------------------------------------------------
> Dipl.-Inf. Rainer Keller email: keller_at_[hidden]
> High Performance Computing Tel: ++49 (0)711-685 5858
> Center Stuttgart (HLRS) Fax: ++49 (0)711-678 7626
> POSTAL:Nobelstrasse 19 http://www.hlrs.de/people/keller
> ACTUAL:Allmandring 30, R. O.030
> 70550 Stuttgart