Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenIB BTL broken on ompi-trunk?
From: Jon Mason (jon_at_[hidden])
Date: 2007-12-03 15:44:37


I'm seeing a crash in the openib btl on ompi-trunk when running any
tests (whether running my own programs or generic ones). For example,
when running IMB pingpong I get the following:

$ mpirun --n 2 --host vic12,vic20 -mca btl openib,self
# /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1 pingpong
--------------------------------------------------------------------------
WARNING: No HCA parameters were found for the HCA that Open MPI
detected:

    Hostname: vic20
    HCA vendor ID: 0x1425
    HCA vendor part ID: 48

Default HCA parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_hca_param_files MCA parameter to set values for your HCA.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_hca_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: No HCA parameters were found for the HCA that Open MPI
detected:

    Hostname: vic12
    HCA vendor ID: 0x1425
    HCA vendor part ID: 48

Default HCA parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_hca_param_files MCA parameter to set values for your HCA.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_hca_params_found to 0.
--------------------------------------------------------------------------
[vic20:04339] *** Process received signal ***
[vic12:04539] *** Process received signal ***
[vic12:04539] Signal: Segmentation fault (11)
[vic12:04539] Signal code: Address not mapped (1)
[vic12:04539] Failing at address: 0xffffffffffffffea
[vic20:04339] Signal: Segmentation fault (11)
[vic20:04339] Signal code: Address not mapped (1)
[vic20:04339] Failing at address: 0xffffffffffffffea
[vic20:04339] [ 0] /lib64/libpthread.so.0 [0x35db80dd40]
[vic20:04339] [ 1] /usr/lib64/libibverbs.so.1(ibv_create_srq+0x3e)
[0x32b7e083be]
[vic20:04339] [ 2]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so
[0x2aaaaf0bdc27]
[vic20:04339] [ 3]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so
[0x2aaaaf0be07e]
[vic20:04339] [ 4]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0x857)
[0x2aaaaf0bd97c]
[vic20:04339] [ 5]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x37d)
[0x2aaaaeeb399e]
[vic20:04339] [ 6]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x15c)
[0x2aaaaec9036b]
[vic20:04339] [ 7]
/usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(ompi_mpi_init+0xb2b)
[0x2aaaaab03817]
[vic20:04339] [ 8]
/usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(MPI_Init+0x15d)
[0x2aaaaab44dc9]
[vic20:04339] [ 9]
/usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1(main+0x29) [0x402df9]
[vic20:04339] [10] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x35dac1d8a4]
[vic20:04339] [11] /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1
[0x402d39]
[vic20:04339] *** End of error message ***
[vic12:04539] [ 0] /lib64/libpthread.so.0 [0x3a7dc0dd40]
[vic12:04539] [ 1] /usr/lib64/libibverbs.so.1(ibv_create_srq+0x3e)
[0x3e82e083be]
[vic12:04539] [ 2]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so
[0x2aaaaf0bdc27]
[vic12:04539] [ 3]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so
[0x2aaaaf0be07e]
[vic12:04539] [ 4]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0x857)
[0x2aaaaf0bd97c]
[vic12:04539] [ 5]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x37d)
[0x2aaaaeeb399e]
[vic12:04539] [ 6]
/usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x15c)
[0x2aaaaec9036b]
[vic12:04539] [ 7]
/usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(ompi_mpi_init+0xb2b)
[0x2aaaaab03817]
[vic12:04539] [ 8]
/usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(MPI_Init+0x15d)
[0x2aaaaab44dc9]
[vic12:04539] [ 9]
/usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1(main+0x29) [0x402df9]
[vic12:04539] [10] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3a7d01d8a4]
[vic12:04539] [11] /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1
[0x402d39]
[vic12:04539] *** End of error message ***
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 4339 on
node vic20 calling "abort". This will have caused other processes
in the application to be terminated by signals sent by mpirun
(as reported here).
--------------------------------------------------------------------------

I am not having any problems running this test with the openib btl on
the ompi-1.2 branch, and I can run this test successfully with the udapl
and tcp btls on ompi-trunk. Is anyone else seeing this problem?

Thanks,
Jon