Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Galen M. Shipman (gshipman_at_[hidden])
Date: 2006-06-29 17:23:29


Okay,

this was a build system issue in change CFLAGS... worked that out,
and then found the real problem... using size_t with mca params
instead of an int... Fix coming shortly..

On Jun 29, 2006, at 2:47 PM, Galen M. Shipman wrote:

> More info:
>
> Two cores are generated
>
>
> mpirun:
> (gdb) bt
> #0 0x00000400002555d8 in .__pthread_alt_unlock () from /lib64/
> libpthread.so.0
> #1 0x0000040000251b10 in .__GI___pthread_mutex_unlock ()
> from /lib64/libpthread.so.0
> #2 0x00000400001108e0 in .poll_dispatch ()
> from /home/ompi/local/lib/libopal.so.0
> #3 0x000004000010ea48 in opal_event_loop (flags=1) at event.c:485
> #4 0x0000040000104078 in opal_progress () at runtime/opal_progress.c:
> 259
> #5 0x0000000010003a08 in opal_condition_wait (c=0x1001acf0,
> m=0x1001aca0)
> at ../../../opal/threads/condition.h:81
> #6 0x0000000010003474 in orterun (argc=7, argv=0xfffffea5948) at
> orterun.c:415
> #7 0x0000000010002c50 in main (argc=7, argv=0xfffffea5948) at
> main.c:13
> #8 0x0000040000336dc8 in .__libc_start_main () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
>
>
>
> orted:
> #0 0x00000400002555d8 in .__pthread_alt_unlock () from /lib64/
> libpthread.so.0
> #1 0x0000040000251b10 in .__GI___pthread_mutex_unlock ()
> from /lib64/libpthread.so.0
> #2 0x00000400001108e0 in .poll_dispatch ()
> from /home/ompi/local/lib/libopal.so.0
> #3 0x000004000010ea48 in opal_event_loop (flags=1) at event.c:485
> #4 0x0000040000104078 in opal_progress () at runtime/opal_progress.c:
> 259
> #5 0x000004000051a35c in mca_oob_tcp_msg_wait (msg=0x10022d68,
> rc=0xfffffe24d60) at oob_tcp_msg.c:106
> #6 0x000004000052497c in mca_oob_tcp_send (name=0x1004a230,
> iov=0xfffffe24e50, count=1, tag=2, flags=0) at oob_tcp_send.c:158
> #7 0x0000040000095e40 in mca_oob_send_packed (peer=0x1004a230,
> buffer=0x1003cf50, tag=2, flags=0) at base/oob_base_send.c:78
> #8 0x0000040000560b50 in orte_gpr_proxy_subscribe (num_subs=1,
> subscriptions=0xfffffe25030, num_trigs=1, trigs=0xfffffe25090)
> at gpr_proxy_subscribe.c:121
> #9 0x000004000007a6ec in orte_gpr_base_subscribe_1 (id=0xfffffe251a0,
> trig_name=0x1003ce80 "orte-stage1-0",
> sub_name=0x1003cd20 "ompi-oob-sub-0", action=39 '\'',
> addr_mode=514,
> segment=0x1003cea0 "orte-job-0", tokens=0x0, key=0x400005260a8
> "oob-tcp",
> cbfunc=0x400005392d0 <mca_oob_tcp_registry_callback>,
> user_tag=0x0)
> at base/gpr_base_simplified_subscribe.c:92
> #10 0x0000040000517a7c in mca_oob_tcp_init () at oob_tcp.c:816
> #11 0x0000040000095110 in mca_oob_base_module_init ()
> at base/oob_base_init.c:263
> #12 0x000004000005ef18 in orte_init_stage2 () at runtime/
> orte_init_stage2.c:48
> #13 0x0000040000062fe8 in orte_system_init (infrastructure=true)
> at runtime/orte_system_init.c:46
> #14 0x000004000005ce50 in orte_init (infrastructure=true)
> at runtime/orte_init.c:48
> #15 0x0000000010001ebc in main (argc=19, argv=0xfffffe267d8) at
> orted.c:282
> #16 0x0000040000336dc8 in .__libc_start_main () from /lib64/libc.so.6
> #17 0x0000000000000000 in ?? ()
>
>
>
>
>
>
> On Jun 29, 2006, at 2:33 PM, Galen M. Shipman wrote:
>
>> Hey Owen,
>>
>> Taking this on list..
>>
>> If I run on n249 orte just hangs waiting for completion of the send.
>>
>> If I run on n248 I get:
>>
>> [ompi_at_node-192-168-111-248 ~]$ mpirun -np 1 -mca btl self,openib ./
>> ring
>> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
>> Failing at addr:0x10
>> [0] func:/home/ompi/local/lib/libopal.so.0 [0x4000012d6c0]
>> [1] func:/lib64/libpthread.so.0 [0x40000257270]
>> [2] func:[0x100428]
>> [3] func:/home/ompi/local/lib/libopal.so.0 [0x40000158310]
>> [4] func:/lib64/libpthread.so.0 [0x40000251b10]
>> [5] func:/home/ompi/local/lib/libopal.so.0 [0x400001108e0]
>> [6] func:/home/ompi/local/lib/libopal.so.0 [0x4000010ea48]
>> [7] func:/home/ompi/local/lib/libopal.so.0 [0x40000104078]
>> [8] func:mpirun [0x10003a08]
>> [9] func:mpirun [0x10003474]
>> [10] func:mpirun [0x10002c50]
>> [11] func:/lib64/libc.so.6 [0x40000336dc8]
>> *** End of error message ***
>> Segmentation fault
>>
>>
>>
>> In order to debug can I get an xterm with proper x forwarding on this
>> machine?
>>
>> - Galen
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel