Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Karl Dockendorf (karld_at_[hidden])
Date: 2006-10-07 22:34:43


I just (yesterday) made the move from LAM/MPI to OpenMPI. The
configure / compile / install went smoothly (version 1.1.1).
However, after recompiling my source and executing it usually crashes
in MPI_INIT. Seems to be coming from the same place MOST of the
time. Usually spits out a message something like this.

Signal:10 info.si_errno:0(Unknown error: 0) si_code:1(BUS_ADRALN)
Failing at addr:0xfdff8018
*** End of error message ***
Signal:10 info.si_errno:0(Unknown error: 0) si_code:1(BUS_ADRALN)
Failing at addr:0x2807000
*** End of error message ***

The test system (before moving back to the cluster) is a G4 PowerBook
with OS 10.4.8 (not using Xgrid at the moment). I'm oversubscribing
it (2 processes, it knows there is only one). Attached are the
config info from the install. And listed below seems to be the crash
point from the mca_bml_r2_progress function. Any help is much
appreciated.

Karl

CRASH 1:
Command: nm
Path: /Users/karl/programs/nm/build/Release/nm
Parent: orted [830]

Version: ??? (???)

PID: 834
Thread: 0

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_INVALID_ADDRESS (0x0001) at 0xfdff8018

Thread 0 Crashed:
0 mca_btl_sm.so 0x003abbec mca_btl_sm_component_progress + 3164
1 mca_bml_r2.so 0x003a0d38 mca_bml_r2_progress + 88
2 libopal.0.dylib 0x0032309c opal_progress + 236
3 mca_oob_tcp.so 0x00024f14 mca_oob_tcp_msg_wait + 52
4 mca_oob_tcp.so 0x0002a0a8 mca_oob_tcp_recv + 1128
5 liborte.0.dylib 0x002f07b0 mca_oob_recv_packed + 80
6 mca_gpr_proxy.so 0x00059bd4 orte_gpr_proxy_put + 804
7 liborte.0.dylib 0x00304318 orte_soh_base_set_proc_soh + 968
8 libmpi.0.dylib 0x00222d88 ompi_mpi_init + 1816
9 libmpi.0.dylib 0x00248b50 MPI_Init + 240
10 nm 0x00002e60 init_model + 48
11 nm 0x00002c70 main + 48
12 nm 0x00002494 _start + 340 (crt.c:272)
13 nm 0x0000233c start + 60

Thread 0 crashed with PPC Thread State 64:
   srr0: 0x00000000003abbec srr1:
0x000000000200f930 vrsave: 0x0000000000000000
     cr: 0x28004222 xer: 0x0000000000000004 lr:
0x00000000003aafa0 ctr: 0x00000000003aaf90
     r0: 0x0000000000000000 r1: 0x00000000bfffe8d0 r2:
0x00000000fdff8000 r3: 0x0000000000000001
     r4: 0x0000000000049814 r5: 0x00000000bfffe888 r6:
0x0000000000000000 r7: 0x00000000fdff8000
     r8: 0x0000000000000004 r9: 0x00000000004177e0 r10:
0x0000000000000004 r11: 0x0000000000000000
    r12: 0x00000000003aaf90 r13: 0x00000000fffffffe r14:
0x00000000003ad004 r15: 0x00000000003441e8
    r16: 0x00000000003ad8c4 r17: 0x0000000000000004 r18:
0x0000000000000000 r19: 0x0000000000000000
    r20: 0x0000000000000014 r21: 0x0000000000000000 r22:
0x00000000003ae0c4 r23: 0x0000000000000001
    r24: 0x0000000000000000 r25: 0x0000000000000004 r26:
0x0000000000029c50 r27: 0x0000000000000000
    r28: 0x0000000000000000 r29: 0x0000000000000001 r30:
0x0000000000000000 r31: 0x00000000003aafa0

CRASH 2:
Command: nm
Path: /Users/karl/programs/nm/build/Release/nm
Parent: orted [830]

Version: ??? (???)

PID: 832
Thread: 0

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

Thread 0 Crashed:
0 <<00000000>> 0x00000000 0 + 0
1 mca_bml_r2.so 0x003a0d38 mca_bml_r2_progress + 88
2 libopal.0.dylib 0x0032309c opal_progress + 236
3 mca_oob_tcp.so 0x00024f14 mca_oob_tcp_msg_wait + 52
4 mca_oob_tcp.so 0x0002a0a8 mca_oob_tcp_recv + 1128
5 liborte.0.dylib 0x002f07b0 mca_oob_recv_packed + 80
6 mca_gpr_proxy.so 0x00059bd4 orte_gpr_proxy_put + 804
7 liborte.0.dylib 0x00304318 orte_soh_base_set_proc_soh + 968
8 libmpi.0.dylib 0x00222d88 ompi_mpi_init + 1816
9 libmpi.0.dylib 0x00248b50 MPI_Init + 240
10 nm 0x00002e60 init_model + 48
11 nm 0x00002c70 main + 48
12 nm 0x00002494 _start + 340 (crt.c:272)
13 nm 0x0000233c start + 60

Thread 0 crashed with PPC Thread State 64:
   srr0: 0x0000000000000000 srr1:
0x000000004000d930 vrsave: 0x0000000000000000
     cr: 0x28004222 xer: 0x0000000000000004 lr:
0x00000000003abe5c ctr: 0x0000000000000000
     r0: 0x0000000000000000 r1: 0x00000000bfffe8d0 r2:
0x0000000002008000 r3: 0x00000000003ad864
     r4: 0x0000000000000000 r5: 0x0000000002008000 r6:
0x0000000000000000 r7: 0x0000000002008000
     r8: 0x00000000003ad8c4 r9: 0x00000000004177e0 r10:
0x0000000000000000 r11: 0x0000000000000000
    r12: 0x0000000000000000 r13: 0x00000000fffffffe r14:
0x00000000003ad004 r15: 0x00000000003441e8
    r16: 0x00000000003ad8c4 r17: 0x0000000000000000 r18:
0x0000000000000000 r19: 0x0000000000000000
    r20: 0x0000000000000000 r21: 0x0000000000000000 r22:
0x00000000003ae0c4 r23: 0x00000000003441e8
    r24: 0x0000000000000000 r25: 0x0000000002008000 r26:
0x00000000003ae0c4 r27: 0x0000000000000001
    r28: 0x0000000000000004 r29: 0x0000000000000001 r30:
0x0000000000000000 r31: 0x00000000003aafa0

CRASH 3:
Command: nm
Path: /Users/karl/programs/nm/build/Debug/nm
Parent: orted [1790]

Version: ??? (???)

PID: 1794
Thread: 0

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_INVALID_ADDRESS (0x0001) at 0xfdff8018

Thread 0 Crashed:
0 mca_btl_sm.so 0x003bcbec mca_btl_sm_component_progress + 3164
1 mca_bml_r2.so 0x003b1d38 mca_bml_r2_progress + 88
2 libopal.0.dylib 0x0032309c opal_progress + 236
3 mca_oob_tcp.so 0x00055f14 mca_oob_tcp_msg_wait + 52
4 mca_oob_tcp.so 0x0005b0a8 mca_oob_tcp_recv + 1128
5 liborte.0.dylib 0x002f07b0 mca_oob_recv_packed + 80
6 mca_gpr_proxy.so 0x00068bd4 orte_gpr_proxy_put + 804
7 liborte.0.dylib 0x00304318 orte_soh_base_set_proc_soh + 968
8 libmpi.0.dylib 0x00222d88 ompi_mpi_init + 1816
9 libmpi.0.dylib 0x00248b50 MPI_Init + 240
10 nm 0x000028fc init_model + 80 (model.c:16)
11 nm 0x00002644 main + 72 (main.c:16)
12 nm 0x00001e54 _start + 340 (crt.c:272)
13 nm 0x00001cfc start + 60

Thread 0 crashed with PPC Thread State 64:
   srr0: 0x00000000003bcbec srr1:
0x000000000200f930 vrsave: 0x0000000000000000
     cr: 0x28004222 xer: 0x0000000000000004 lr:
0x00000000003bbfa0 ctr: 0x00000000003bbf90
     r0: 0x0000000000000000 r1: 0x00000000bfffe8f0 r2:
0x00000000fdff8000 r3: 0x0000000000000001
     r4: 0x0000000000049814 r5: 0x00000000bfffe8a8 r6:
0x0000000000000000 r7: 0x00000000fdff8000
     r8: 0x0000000000000004 r9: 0x00000000004177d0 r10:
0x0000000000000004 r11: 0x0000000000000000
    r12: 0x00000000003bbf90 r13: 0x00000000fffffffe r14:
0x00000000003be004 r15: 0x00000000003441e8
    r16: 0x00000000003be8c4 r17: 0x0000000000000004 r18:
0x0000000000000000 r19: 0x0000000000000000
    r20: 0x0000000000000014 r21: 0x0000000000000000 r22:
0x00000000003bf0c4 r23: 0x0000000000000001
    r24: 0x0000000000000000 r25: 0x0000000000000004 r26:
0x000000000005ac50 r27: 0x0000000000000000
    r28: 0x0000000000000000 r29: 0x0000000000000001 r30:
0x0000000000000000 r31: 0x00000000003bbfa0

CRASH 4:
Command: nm
Path: /Users/karl/programs/nm/build/Debug/nm
Parent: orted [1790]

Version: ??? (???)

PID: 1792
Thread: 0

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

Thread 0 Crashed:
0 <<00000000>> 0x00000000 0 + 0
1 mca_bml_r2.so 0x003b1d38 mca_bml_r2_progress + 88
2 libopal.0.dylib 0x0032309c opal_progress + 236
3 mca_oob_tcp.so 0x00055f14 mca_oob_tcp_msg_wait + 52
4 mca_oob_tcp.so 0x0005b0a8 mca_oob_tcp_recv + 1128
5 liborte.0.dylib 0x002f07b0 mca_oob_recv_packed + 80
6 mca_gpr_proxy.so 0x00068bd4 orte_gpr_proxy_put + 804
7 liborte.0.dylib 0x00304318 orte_soh_base_set_proc_soh + 968
8 libmpi.0.dylib 0x00222d88 ompi_mpi_init + 1816
9 libmpi.0.dylib 0x00248b50 MPI_Init + 240
10 nm 0x000028fc init_model + 80 (model.c:16)
11 nm 0x00002644 main + 72 (main.c:16)
12 nm 0x00001e54 _start + 340 (crt.c:272)
13 nm 0x00001cfc start + 60

Thread 0 crashed with PPC Thread State 64:
   srr0: 0x0000000000000000 srr1:
0x000000004000d930 vrsave: 0x0000000000000000
     cr: 0x28004222 xer: 0x0000000000000004 lr:
0x00000000003bce5c ctr: 0x0000000000000000
     r0: 0x0000000000000000 r1: 0x00000000bfffe8f0 r2:
0x0000000002008000 r3: 0x00000000003be864
     r4: 0x0000000000000000 r5: 0x0000000002008000 r6:
0x0000000000000000 r7: 0x0000000002008000
     r8: 0x00000000003be8c4 r9: 0x00000000004177d0 r10:
0x0000000000000000 r11: 0x0000000000000000
    r12: 0x0000000000000000 r13: 0x00000000fffffffe r14:
0x00000000003be004 r15: 0x00000000003441e8
    r16: 0x00000000003be8c4 r17: 0x0000000000000000 r18:
0x0000000000000000 r19: 0x0000000000000000
    r20: 0x0000000000000000 r21: 0x0000000000000000 r22:
0x00000000003bf0c4 r23: 0x00000000003441e8
    r24: 0x0000000000000000 r25: 0x0000000002008000 r26:
0x00000000003bf0c4 r27: 0x0000000000000001
    r28: 0x0000000000000004 r29: 0x0000000000000001 r30:
0x0000000000000000 r31: 0x00000000003bbfa0