Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] OpenIB not functioning on 1.5.x (works on 1.4.3)
From: David Fiala (davidfiala_at_[hidden])
Date: 2010-12-13 18:08:34


Hi,

I noticed that I can get the openib transport to work successfully
under version 1.4.3 when configured with: --with-openib
--enable-openib-ibcm

When I configure 1.5 or 1.5.1 I used: --with-openib (noting the
absence of the ibmc flag)
However, when I actually try to use openib on a basic MPI program I
get a segfault such as the one copied below.

Our IB hardware is:
InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s
- IB QDR / 10GigE] (rev b0)

dfiala_at_compute-0-2 ~]$ mpirun -mca btl openib,self ./mpitest/mpitest
[compute-0-2:07582] *** Process received signal ***
[compute-0-2:07582] Signal: Segmentation fault (11)
[compute-0-2:07582] Signal code: Address not mapped (1)
[compute-0-2:07582] Failing at address: 0x2
[compute-0-2:07582] [ 0] /lib64/libpthread.so.0 [0x3ed2e0eb10]
[compute-0-2:07582] [ 1] /usr/lib64/libmlx4-rdmav2.so [0x2aaaab0de5d1]
[compute-0-2:07582] [ 2]
/home/dfiala/openmpi/install-1.5.1/lib/openmpi/mca_btl_openib.so
[0x2b1637155f15]
[compute-0-2:07582] [ 3]
/home/dfiala/openmpi/install-1.5.1/lib/openmpi/mca_bml_r2.so
[0x2b163691b4b2]
[compute-0-2:07582] [ 4]
/home/dfiala/openmpi/install-1.5.1/lib/openmpi/mca_pml_ob1.so
[0x2b1636d3844f]
[compute-0-2:07582] [ 5] /home/dfiala/openmpi/install/lib/libmpi.so.1
[0x2b16347afe37]
[compute-0-2:07582] [ 6]
/home/dfiala/openmpi/install/lib/libmpi.so.1(MPI_Init+0xf0)
[0x2b16347c46d0]
[compute-0-2:07582] [ 7] ./mpitest/mpitest(main+0x2b) [0x4008d3]
[compute-0-2:07582] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3ed261d994]
[compute-0-2:07582] [ 9] ./mpitest/mpitest [0x4007f9]
[compute-0-2:07582] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 7582 on node
compute-0-2.local exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Any ideas?

Thanks for your help,
David Fiala

North Carolina State University