Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Bus error with openmpi-1.7.4a1r29784 and openmpi-1.9a1r29790
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2013-12-04 10:16:16


Hi,

today I installed openmpi-1.7.4a1r29784 on "Solaris 10, Sparc"
with "Sun C 5.12" with the following configure command.

../openmpi-1.7.4a1r29784/configure \
  --prefix=/usr/local/openmpi-1.7.4_64_cc \
  --libdir=/usr/local/openmpi-1.7.4_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.7.0_07/bin/sparcv9 \
  --with-jdk-headers=/usr/local/jdk1.7.0_07/include \
  JAVA_HOME=/usr/local/jdk1.7.0_07 \
  LDFLAGS="-m64" \
  CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  CPPFLAGS="" CXXCPPFLAGS="" \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-opal-multi-threads \
  --enable-mpi-thread-multiple \
  --with-threads=posix \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags=-m64 \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

1) Bus error with "ompi_info -a"

tyr fd1026 108 ompi_info | grep MPI:
                Open MPI: 1.7.4a1r29784

I get a Bus Error, if I use option "-a".

tyr fd1026 109 ompi_info -a | grep MPI:
[tyr:17668] *** Process received signal ***
[tyr:17668] Signal: Bus Error (10)
[tyr:17668] Signal code: Invalid address alignment (1)
[tyr:17668] Failing at address: ffffffff7d3ca461
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  opal_backtrace_print+0x14
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  0x1843d8
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  0x13a3dc [ Signal 2099942168 (?)]
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  mca_base_var_dump+0x190
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  0x899a8
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  opal_info_show_mca_params+0xb4
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/lib64/libopen-pal.so.6.0.0:
  opal_info_do_params+0x364
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/bin/ompi_info:main+0x6e4
/export2/prog/SunOS_sparc/openmpi-1.7.4_64_cc/bin/ompi_info:_start+0x12c
[tyr:17668] *** End of error message ***
Bus error
tyr fd1026 110

tyr fd1026 112 cd /usr/local/openmpi-1.7.4_64_cc/bin/
tyr bin 113 /opt/solstudio12.3/bin/sparcv9/dbx ompi_info
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading ompi_info
Reading ld.so.1
Reading libmpi.so.1.2.0
Reading libopen-rte.so.6.0.0
Reading libopen-pal.so.6.0.0
Reading libsendfile.so.1
Reading libpicl.so.1
Reading libkstat.so.1
Reading liblgrp.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading librt.so.1
Reading libm.so.2
Reading libthread.so.1
Reading libc.so.1
Reading libdoor.so.1
Reading libaio.so.1
Reading libmd.so.1
(dbx) run -a
Running: ompi_info -a
(process id 17678)
Reading libc_psr.so.1
...
Reading mca_topo_basic.so
Reading mca_vprotocol_pessimist.so
                  Prefix: /usr/local/openmpi-1.7.4_64_cc
             Exec_prefix: /usr/local/openmpi-1.7.4_64_cc
                  Bindir: /usr/local/openmpi-1.7.4_64_cc/bin
...
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
                 MCA mca: parameter "mca_param_files" (current value:
                          "/home/fd1026/.openmpi/mca-params.conf:
                           
/usr/local/openmpi-1.7.4_64_cc/etc/openmpi-mca-params.conf",
                          data source: default, level: 2 user/detail, type:
                          string, deprecated, synonym of:
                          mca_base_param_files)
                          Path for MCA configuration files containing
                          variable values
                 MCA mca: parameter "mca_component_path" (current value:
                          "/usr/local/openmpi-1.7.4_64_cc/lib64/openmpi:
                           /home/fd1026/.openmpi/components",
                          data source: default, level: 9 dev/all, type:
                          string, deprecated, synonym of:
                          mca_base_component_path)
                          Path where to look for Open MPI and ORTE components
                 MCA mca: parameter "mca_component_show_load_errors" (current
                          value: "true", data source: default, level: 9
                          dev/all, type: bool, deprecated, synonym of:
                          mca_base_component_show_load_errors)
                          Whether to show errors for components that failed
                          to load or not
                          Valid values: 0: f|false|disabled, 1:
                          t|true|enabled
t_at_1 (l_at_1) signal BUS (invalid address alignment) in var_value_string at
  line 1685 in file "mca_base_var.c"
 1685 ret = var->mbv_enumerator->string_from_value(var->mbv_enumerator,
   value->intval, &tmp);
(dbx)
(dbx)
(dbx)
(dbx) check -all
dbx: warning: check -all will be turned on in the next run of the process
access checking - OFF
memuse checking - OFF
(dbx) run -a
Running: ompi_info -a
(process id 17680)
Reading rtcapihook.so
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading rtcboot.so
Reading librtc.so
Reading libmd_psr.so.1
RTC: Enabling Error Checking...
RTC: Using UltraSparc trap mechanism
RTC: See `help rtc showmap' and `help rtc limitations' for details.
RTC: Running program...
Read from uninitialized (rui) on thread 1:
Attempting to read 4 bytes at address 0xffffffff7fffd548
    which is 184 bytes above the current stack pointer
Variable is 'index'
t_at_1 (l_at_1) stopped in var_find at line 802 in file "mca_base_var.c"
  802 return (OPAL_SUCCESS != ret) ? ret : index;
(dbx)

2) Bus error with "make check"

tail -15 log.make-check.SunOS.sparc.64_cc
>>--------------------------------------------<<
PASS: ddt_test
/bin/bash: line 5: 4466 Bus Error ${dir}$tst
FAIL: ddt_raw
========================================================
1 of 6 tests failed
Please report to http://www.open-mpi.org/community/help/
========================================================
make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `.../test/datatype'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `.../test/datatype'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `.../test'
make: *** [check-recursive] Error 1

tyr openmpi-1.7.4a1r29784-SunOS.sparc.64_cc 116 cd test/datatype/.libs/
tyr .libs 117 /opt/solstudio12.3/bin/sparcv9/dbx ddt_raw
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading ddt_raw
Reading ld.so.1
Reading libmpi.so.1.2.0
Reading libopen-rte.so.6.0.0
Reading libopen-pal.so.6.0.0
Reading libsendfile.so.1
Reading libpicl.so.1
Reading libkstat.so.1
Reading liblgrp.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading librt.so.1
Reading libm.so.2
Reading libthread.so.1
Reading libc.so.1
Reading libdoor.so.1
Reading libaio.so.1
Reading libmd.so.1
(dbx) run
Running: ddt_raw
(process id 17689)
Reading libc_psr.so.1

#
 * TEST INVERSED VECTOR
 #

t_at_1 (l_at_1) signal BUS (invalid address alignment) in opal_convertor_raw
   at line 64 in file "opal_convertor_raw.c"
   64 DO_DEBUG( opal_output( 0, "opal_convertor_raw( %p, {%p,
   %u}, %lu )\n", (void*)pConvertor,
(dbx)
(dbx)
(dbx)
(dbx) check -all
dbx: warning: check -all will be turned on in the next run of the process
access checking - OFF
memuse checking - OFF
(dbx) run
Running: ddt_raw
(process id 17691)
Reading rtcapihook.so
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading libgen.so.1
Reading rtcboot.so
Reading librtc.so
Reading libmd_psr.so.1
RTC: Enabling Error Checking...
RTC: Using UltraSparc trap mechanism
RTC: See `help rtc showmap' and `help rtc limitations' for details.
RTC: Running program...

#
 * TEST INVERSED VECTOR
 #

Misaligned read (mar) on thread 1:
Attempting to read 4 bytes at address 0xffffffff60cca179
t_at_1 (l_at_1) stopped in opal_convertor_raw at line 64 in file
  "opal_convertor_raw.c"
   64 DO_DEBUG( opal_output( 0, "opal_convertor_raw( %p,
  {%p, %u}, %lu )\n", (void*)pConvertor,
(dbx)

3) Bus error with my programs

tyr small_prog 122 mpicc init_finalize.c
tyr small_prog 123 /opt/solstudio12.3/bin/sparcv9/dbx
/usr/local/openmpi-1.7.4_64_cc/bin/mpiexec
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading mpiexec
Reading ld.so.1
Reading libopen-rte.so.6.0.0
Reading libopen-pal.so.6.0.0
Reading libsendfile.so.1
Reading libpicl.so.1
Reading libkstat.so.1
Reading liblgrp.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading librt.so.1
Reading libm.so.2
Reading libthread.so.1
Reading libc.so.1
Reading libdoor.so.1
Reading libaio.so.1
Reading libmd.so.1
(dbx) run -np 1 a.out
Running: mpiexec -np 1 a.out
(process id 17791)
Reading libc_psr.so.1
Reading mca_shmem_mmap.so
Reading libmp.so.2
...

Reading mca_dfs_orted.so
Reading mca_dfs_test.so
t_at_1 (l_at_1) signal BUS (invalid address alignment) in opal_net_samenetwork
  at line 272 in file "net.c"
  272 (inaddr2->sin_addr.s_addr & netmask)) {
(dbx)
(dbx)
(dbx)
(dbx) check -all
dbx: warning: check -all will be turned on in the next run of the process
access checking - OFF
memuse checking - OFF
(dbx) run -np 1 a.out
Running: mpiexec -np 1 a.out
(process id 17794)
Reading rtcapihook.so
Reading libdl.so.1
Reading rtcaudit.so
Reading libmapmalloc.so.1
Reading rtcboot.so
Reading librtc.so
Reading libmd_psr.so.1
RTC: Enabling Error Checking...
RTC: Using UltraSparc trap mechanism
RTC: See `help rtc showmap' and `help rtc limitations' for details.
RTC: Running program...
Read from uninitialized (rui) on thread 1:
Attempting to read 4 bytes at address 0xffffffff7fffd368
    which is 184 bytes above the current stack pointer
Variable is 'index'
t_at_1 (l_at_1) stopped in var_find at line 802 in file "mca_base_var.c"
  802 return (OPAL_SUCCESS != ret) ? ret : index;
(dbx)

I have the same problems with openmpi-1.9a1r29790 (same files).

tyr fd1026 107 ompi_info |grep MPI:
                Open MPI: 1.9a1r29790
tyr fd1026 108 ompi_info -a | grep MPI:
[tyr:17867] *** Process received signal ***
[tyr:17867] Signal: Bus Error (10)
[tyr:17867] Signal code: Invalid address alignment (1)
[tyr:17867] Failing at address: ffffffff7d3c5ac1
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  opal_backtrace_print+0x14
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  0x17f268
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  0x134b9c [ Signal 2099923552 (?)]
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  mca_base_var_dump+0x1b0
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  0x89828
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  opal_info_show_mca_params+0xb4
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:
  opal_info_do_params+0x364
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/ompi_info:main+0x628
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/ompi_info:_start+0x12c
[tyr:17867] *** End of error message ***
Bus error
tyr fd1026 109

I would be grateful, if somebody could solve the problems. Do you need
any further information?

Kind regards

Siegmar