Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OpenIB not functioning on 1.5.x (works on 1.4.3)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-12-15 21:10:50


On Dec 13, 2010, at 6:08 PM, David Fiala wrote:

> I noticed that I can get the openib transport to work successfully
> under version 1.4.3 when configured with: --with-openib
> --enable-openib-ibcm

I'm surprised; the IBCM connection manager support in OMPI is not complete. You probably shouldn't be using it.

> When I configure 1.5 or 1.5.1 I used: --with-openib (noting the
> absence of the ibmc flag)

Good.

> However, when I actually try to use openib on a basic MPI program I
> get a segfault such as the one copied below.
>
> Our IB hardware is:
> InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s
> - IB QDR / 10GigE] (rev b0)

Ick. Mellanox, can you reply?

> dfiala_at_compute-0-2 ~]$ mpirun -mca btl openib,self ./mpitest/mpitest
> [compute-0-2:07582] *** Process received signal ***
> [compute-0-2:07582] Signal: Segmentation fault (11)
> [compute-0-2:07582] Signal code: Address not mapped (1)
> [compute-0-2:07582] Failing at address: 0x2
> [compute-0-2:07582] [ 0] /lib64/libpthread.so.0 [0x3ed2e0eb10]
> [compute-0-2:07582] [ 1] /usr/lib64/libmlx4-rdmav2.so [0x2aaaab0de5d1]
> [compute-0-2:07582] [ 2]
> /home/dfiala/openmpi/install-1.5.1/lib/openmpi/mca_btl_openib.so
> [0x2b1637155f15]
> [compute-0-2:07582] [ 3]
> /home/dfiala/openmpi/install-1.5.1/lib/openmpi/mca_bml_r2.so
> [0x2b163691b4b2]
> [compute-0-2:07582] [ 4]
> /home/dfiala/openmpi/install-1.5.1/lib/openmpi/mca_pml_ob1.so
> [0x2b1636d3844f]
> [compute-0-2:07582] [ 5] /home/dfiala/openmpi/install/lib/libmpi.so.1
> [0x2b16347afe37]
> [compute-0-2:07582] [ 6]
> /home/dfiala/openmpi/install/lib/libmpi.so.1(MPI_Init+0xf0)
> [0x2b16347c46d0]
> [compute-0-2:07582] [ 7] ./mpitest/mpitest(main+0x2b) [0x4008d3]
> [compute-0-2:07582] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3ed261d994]
> [compute-0-2:07582] [ 9] ./mpitest/mpitest [0x4007f9]
> [compute-0-2:07582] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 7582 on node
> compute-0-2.local exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> Any ideas?
>
> Thanks for your help,
> David Fiala
>
> North Carolina State University
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/