Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Aurelien Bouteiller (bouteill_at_[hidden])
Date: 2007-04-19 10:38:57

I am experiencing several fancy bugs with ORTE.

All bugs occur on Intel 32 bits architecture under Mac OS X using gcc
4.2. The tested version is todays trunk (it also have occured for at
least three weeks)

First occurs when compiling in "optimized" mode (aka configure
--disable-debug --with-platform=optimized) and does not occur in debug

~/ompi$ mpirun -np 1 echo foo
[laptop20:22960] *** Process received signal ***
[laptop20:22960] Signal: Bus error (10)
[laptop20:22960] Signal code: (0)
[laptop20:22960] Failing at address: 0x0
[ 1] [0xbffff698, 0x00000000] (-P-)
[ 2] (mca_oob_base_init + 0x26) [0xbffff6e8, 0x000878a6]
[ 3] (orte_rml_oob_init + 0x11) [0xbffff6f8, 0x00032f21]
[ 4] (orte_rml_base_select + 0xc5) [0xbffff778, 0x0009f415]
[ 5] (orte_init_stage1 + 0x20c) [0xbffff848, 0x000678cc]
[ 6] (orte_system_init + 0x1d) [0xbffff868, 0x0006b03d]
[ 7] (orte_init + 0x7d) [0xbffff888, 0x000674ad]
[ 8] (orterun:F(0,1)=r(0,1);-2147483648;2147483647; + 0x220)
[0xbffff938, 0x00002008]
[ 9] (main:F(0,1)=r(0,1);-2147483648;2147483647; + 0x18) [0xbffff948,
[10] (_start + 0xd8) [0xbffff988, 0x00001db2]
[11] (start + 0x29) [0xbffff9a0, 0x00001cd9]
[12] [0x00000000, 0x00000005] (FP-)
[laptop20:22960] *** End of error message ***
Bus error

The other one occurs when running MPI program without mpirun (I know
this is pretty useless but still ;) ). This bug does not require
specific compilation options to occur. Running mpirun -np 1 mympiprogram
is fine, but running mympiprogram fails with segfault in MPI_Finalize:

~/ompi$ mpirun -np 1 mpiself
~/ompi$ gdb mpiself
(gdb) r
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x77767578
0x90002e46 in szone_malloc ()
(gdb) bt
#0 0x90002e46 in szone_malloc ()
#1 0x0042b6da in opal_memory_darwin_malloc (zone=0x2000000, size=48) at
#2 0x90002a2f in malloc ()
#3 0x00421548 in opal_malloc (size=48, file=0x274fd4
"../../../opal/class/opal_object.h", line=468) at malloc.c:96
#4 0x002218e4 in opal_obj_new (cls=0x27d840) at
#5 0x00221851 in opal_obj_new_debug (type=0x27d840, file=0x275424
"base/gpr_base_create_value_keyval.c", line=43) at
#6 0x0022147e in orte_gpr_base_create_value (value=0xbffff8fc,
addr_mode=32769, segment=0x510150 "orte-job-0", cnt=2, num_tokens=0) at
#7 0x00269b79 in orte_smr_base_set_proc_state (proc=0x507d00, state=32,
exit_status=0) at base/smr_base_set_proc_state.c:54
#8 0x01035f21 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:145
#9 0x0106ea09 in MPI_Finalize () at finalize.c:44
#10 0x00001e5e in main (argc=1, argv=0xbffffb70) at mpiself.c:44