Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Galen M. Shipman (gshipman_at_[hidden])
Date: 2006-06-29 16:47:19


More info:

Two cores are generated

mpirun:
(gdb) bt
#0 0x00000400002555d8 in .__pthread_alt_unlock () from /lib64/
libpthread.so.0
#1 0x0000040000251b10 in .__GI___pthread_mutex_unlock ()
    from /lib64/libpthread.so.0
#2 0x00000400001108e0 in .poll_dispatch ()
    from /home/ompi/local/lib/libopal.so.0
#3 0x000004000010ea48 in opal_event_loop (flags=1) at event.c:485
#4 0x0000040000104078 in opal_progress () at runtime/opal_progress.c:
259
#5 0x0000000010003a08 in opal_condition_wait (c=0x1001acf0,
m=0x1001aca0)
     at ../../../opal/threads/condition.h:81
#6 0x0000000010003474 in orterun (argc=7, argv=0xfffffea5948) at
orterun.c:415
#7 0x0000000010002c50 in main (argc=7, argv=0xfffffea5948) at main.c:13
#8 0x0000040000336dc8 in .__libc_start_main () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()

orted:
#0 0x00000400002555d8 in .__pthread_alt_unlock () from /lib64/
libpthread.so.0
#1 0x0000040000251b10 in .__GI___pthread_mutex_unlock ()
    from /lib64/libpthread.so.0
#2 0x00000400001108e0 in .poll_dispatch ()
    from /home/ompi/local/lib/libopal.so.0
#3 0x000004000010ea48 in opal_event_loop (flags=1) at event.c:485
#4 0x0000040000104078 in opal_progress () at runtime/opal_progress.c:
259
#5 0x000004000051a35c in mca_oob_tcp_msg_wait (msg=0x10022d68,
     rc=0xfffffe24d60) at oob_tcp_msg.c:106
#6 0x000004000052497c in mca_oob_tcp_send (name=0x1004a230,
     iov=0xfffffe24e50, count=1, tag=2, flags=0) at oob_tcp_send.c:158
#7 0x0000040000095e40 in mca_oob_send_packed (peer=0x1004a230,
     buffer=0x1003cf50, tag=2, flags=0) at base/oob_base_send.c:78
#8 0x0000040000560b50 in orte_gpr_proxy_subscribe (num_subs=1,
     subscriptions=0xfffffe25030, num_trigs=1, trigs=0xfffffe25090)
     at gpr_proxy_subscribe.c:121
#9 0x000004000007a6ec in orte_gpr_base_subscribe_1 (id=0xfffffe251a0,
     trig_name=0x1003ce80 "orte-stage1-0",
     sub_name=0x1003cd20 "ompi-oob-sub-0", action=39 '\'',
addr_mode=514,
     segment=0x1003cea0 "orte-job-0", tokens=0x0, key=0x400005260a8
"oob-tcp",
     cbfunc=0x400005392d0 <mca_oob_tcp_registry_callback>, user_tag=0x0)
     at base/gpr_base_simplified_subscribe.c:92
#10 0x0000040000517a7c in mca_oob_tcp_init () at oob_tcp.c:816
#11 0x0000040000095110 in mca_oob_base_module_init ()
     at base/oob_base_init.c:263
#12 0x000004000005ef18 in orte_init_stage2 () at runtime/
orte_init_stage2.c:48
#13 0x0000040000062fe8 in orte_system_init (infrastructure=true)
     at runtime/orte_system_init.c:46
#14 0x000004000005ce50 in orte_init (infrastructure=true)
     at runtime/orte_init.c:48
#15 0x0000000010001ebc in main (argc=19, argv=0xfffffe267d8) at
orted.c:282
#16 0x0000040000336dc8 in .__libc_start_main () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()

On Jun 29, 2006, at 2:33 PM, Galen M. Shipman wrote:

> Hey Owen,
>
> Taking this on list..
>
> If I run on n249 orte just hangs waiting for completion of the send.
>
> If I run on n248 I get:
>
> [ompi_at_node-192-168-111-248 ~]$ mpirun -np 1 -mca btl self,openib ./
> ring
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x10
> [0] func:/home/ompi/local/lib/libopal.so.0 [0x4000012d6c0]
> [1] func:/lib64/libpthread.so.0 [0x40000257270]
> [2] func:[0x100428]
> [3] func:/home/ompi/local/lib/libopal.so.0 [0x40000158310]
> [4] func:/lib64/libpthread.so.0 [0x40000251b10]
> [5] func:/home/ompi/local/lib/libopal.so.0 [0x400001108e0]
> [6] func:/home/ompi/local/lib/libopal.so.0 [0x4000010ea48]
> [7] func:/home/ompi/local/lib/libopal.so.0 [0x40000104078]
> [8] func:mpirun [0x10003a08]
> [9] func:mpirun [0x10003474]
> [10] func:mpirun [0x10002c50]
> [11] func:/lib64/libc.so.6 [0x40000336dc8]
> *** End of error message ***
> Segmentation fault
>
>
>
> In order to debug can I get an xterm with proper x forwarding on this
> machine?
>
> - Galen
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel