Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] divide-by-zero in mca_btl_openib_add_procs
From: Alain Miniussi (alain.miniussi_at_[hidden])
Date: 2014-05-26 09:59:38


Hi,

I have a failure with the following minimalistic testcase:
$: more ./test.c
#include "mpi.h"

int main(int argc, char* argv[]) {
     MPI_Init(&argc,&argv);
     MPI_Finalize();
     return 0;
}
$: mpicc -v
icc version 13.1.1 (gcc version 4.4.7 compatibility)
$: mpicc ./test.c
$: ./a.out
[tagir:02855] *** Process received signal ***
[tagir:02855] Signal: Floating point exception (8)
[tagir:02855] Signal code: Integer divide-by-zero (1)
[tagir:02855] Failing at address: 0x2aef6e5b2d9f
[tagir:02855] [ 0] /lib64/libpthread.so.0[0x30f920f710]
[tagir:02855] [ 1]
/softs/openmpi-1.8.1-intel13/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0xe9f)[0x2aef6e5b2d9f]
[tagir:02855] [ 2]
/softs/openmpi-1.8.1-intel13/lib/openmpi/mca_bml_r2.so(+0x1481)[0x2aef6e3a6481]
[tagir:02855] [ 3]
/softs/openmpi-1.8.1-intel13/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xa8)[0x2aef6f8ef2f8]
[tagir:02855] [ 4]
/softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(ompi_mpi_init+0x9f6)[0x2aef69572236]
[tagir:02855] [ 5]
/softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(MPI_Init+0xef)[0x2aef6959474f]
[tagir:02855] [ 6] ./a.out[0x400dd1]
[tagir:02855] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30f8a1ed1d]
[tagir:02855] [ 8] ./a.out[0x400cc9]
[tagir:02855] *** End of error message ***
$:

Versions info:
$: mpicc -v
icc version 13.1.1 (gcc version 4.4.7 compatibility)
$: ldd ./a.out
     linux-vdso.so.1 => (0x00007fffbb197000)
     libmpi.so.1 => /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1
(0x00002b20262ee000)
     libm.so.6 => /lib64/libm.so.6 (0x00000030f8e00000)
     libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000030ff200000)
     libpthread.so.0 => /lib64/libpthread.so.0 (0x00000030f9200000)
     libc.so.6 => /lib64/libc.so.6 (0x00000030f8a00000)
     libdl.so.2 => /lib64/libdl.so.2 (0x00000030f9600000)
     libopen-rte.so.7 =>
/softs/openmpi-1.8.1-intel13/lib/libopen-rte.so.7 (0x00002b202660d000)
     libopen-pal.so.6 =>
/softs/openmpi-1.8.1-intel13/lib/libopen-pal.so.6 (0x00002b20268a1000)
     libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00002b2026ba6000)
     librt.so.1 => /lib64/librt.so.1 (0x00000030f9e00000)
     libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003109800000)
     libutil.so.1 => /lib64/libutil.so.1 (0x000000310aa00000)
     libimf.so =>
/softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libimf.so
(0x00002b2026db0000)
     libsvml.so =>
/softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libsvml.so
(0x00002b202726d000)
     libirng.so =>
/softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirng.so
(0x00002b2027c37000)
     libintlc.so.5 =>
/softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5
(0x00002b2027e3e000)
     /lib64/ld-linux-x86-64.so.2 (0x00000030f8600000)
$:

I tried to goole the issue, and saw something regarding an old
vectorization bug with intel compiler, but that was a lonng time ago and
seemed to be fixed for 1.6.x.
Also, "make check" went fine ???

Any idea ?

Cheers

-- 
---
Alain