Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] oMPI hang with IB question
From: Dylan Nelson (dnelson_at_[hidden])
Date: 2012-03-20 15:35:12


Hello,

I've been having trouble for awhile now running some OpenMPI+IB jobs on
multiple tasks. The problems are all "hangs" and are not reproducible - the
same execution started again will in general proceed just fine where
previously it got stuck, but then later get stuck. These stuck processes are
pegged at 100% CPU usage and remain there for days if not killed.

The same nature of problem exists in oMPI 1.2.5, 1.4.2, and 1.5.3 (for the
code I am running). This is quite possible some problem in the
configuration/cluster, I am not claiming that it is a bug in oMPI, but was
just hopeful that someone might have a guess as to what is going on.

In ancient 1.2.5 the problem manifests as (I attach gdb to the stalled
process on one of the child nodes):

--------------------------------------------------------------------

(gdb) bt
#0 0x00002b8135b3f699 in ibv_cmd_create_qp () from
/usr/lib64/libmlx4-rdmav2.so
#1 0x00002b8135b3faa6 in ibv_cmd_create_qp () from
/usr/lib64/libmlx4-rdmav2.so
#2 0x00002b813407bff1 in btl_openib_component_progress ()
   from /n/sw/openmpi-1.2.5-gcc-4.1.2/lib/openmpi/mca_btl_openib.so
#3 0x00002b8133e6f04a in mca_bml_r2_progress () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib/openmpi/mca_bml_r2.so
#4 0x00002b812f52c9ba in opal_progress () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libopen-pal.so.0
#5 0x00002b812f067b05 in ompi_request_wait_all () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libmpi.so.0
#6 0x0000000000000000 in ?? ()
(gdb) next
Single stepping until exit from function ibv_cmd_create_qp, which has no
line number information.
0x00002b8135b3f358 in pthread_spin_unlock_at_plt () from
/usr/lib64/libmlx4-rdmav2.so
(gdb) next
Single stepping until exit from function pthread_spin_unlock_at_plt, which has
no line number information.
0x00000038c860b760 in pthread_spin_unlock () from /lib64/libpthread.so.0
(gdb) next
Single stepping until exit from function pthread_spin_unlock, which has no
line number information.
0x00002b8135b3fc21 in ibv_cmd_create_qp () from /usr/lib64/libmlx4-rdmav2.so
(gdb) next
Single stepping until exit from function ibv_cmd_create_qp, which has no
line number information.
0x00002b813407bff1 in btl_openib_component_progress ()
   from /n/sw/openmpi-1.2.5-gcc-4.1.2/lib/openmpi/mca_btl_openib.so
(gdb) next
Single stepping until exit from function btl_openib_component_progress,
which has no line number information.
0x00002b8133e6f04a in mca_bml_r2_progress () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib/openmpi/mca_bml_r2.so
(gdb) next
Single stepping until exit from function mca_bml_r2_progress, which has no
line number information.
0x00002b812f52c9ba in opal_progress () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libopen-pal.so.0
(gdb) next
Single stepping until exit from function opal_progress, which has no line
number information.
0x00002b812f067b05 in ompi_request_wait_all () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libmpi.so.0
(gdb) next
Single stepping until exit from function ompi_request_wait_all, which has no
line number information.

---hang--- (infinite loop?)

On a different task:

0x00002ba2383b4982 in opal_progress () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libopen-pal.so.0
(gdb) bt
#0 0x00002ba2383b4982 in opal_progress () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libopen-pal.so.0
#1 0x00002ba237eefb05 in ompi_request_wait_all () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libmpi.so.0
#2 0x0000000000000000 in ?? ()
(gdb) next
Single stepping until exit from function opal_progress, which has no line
number information.
0x00002ba237eefb05 in ompi_request_wait_all () from
/n/sw/openmpi-1.2.5-gcc-4.1.2/lib64/libmpi.so.0
(gdb) next
Single stepping until exit from function ompi_request_wait_all, which has no
line number information.

---hang---

--------------------------------------------------------------------

On 1.5.3 a similar "hang" problem happens but the backtrace goes back to the
original code call which is a MPI_Sendrecv():

--------------------------------------------------------------------

3510 OPAL_THREAD_UNLOCK(&endpoint->eager_rdma_local.lock);
(gdb) bt
#0 progress_one_device () at btl_openib_component.c:3510
#1 btl_openib_component_progress () at btl_openib_component.c:3541
#2 0x00002b722f348b35 in opal_progress () at runtime/opal_progress.c:207
#3 0x00002b722f287025 in opal_condition_wait (buf=0x2aaaab636298,
count=251328, datatype=0x6ef240, dst=12, tag=35,
    sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x6ee430) at
../../../../opal/threads/condition.h:99
#4 ompi_request_wait_completion (buf=0x2aaaab636298, count=251328,
datatype=0x6ef240, dst=12, tag=35,
    sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x6ee430) at
../../../../ompi/request/request.h:377
#5 mca_pml_ob1_send (buf=0x2aaaab636298, count=251328, datatype=0x6ef240,
dst=12, tag=35,
    sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x6ee430) at
pml_ob1_isend.c:125
#6 0x00002b722f1cb568 in PMPI_Sendrecv (sendbuf=0x2aaba9587398,
sendcount=251328, sendtype=0x6ef240, dest=12,
    sendtag=35, recvbuf=0x2aaba7a555f8, recvcount=259008, recvtype=0x6ef240,
source=12, recvtag=35, comm=0x6ee430,
    status=0x6f2160) at psendrecv.c:84
#7 0x0000000000472fd5 in voronoi_ghost_search (T=0xf70b40) at
voronoi_ghost_search.c:190
#8 0x00000000004485c6 in create_mesh () at voronoi.c:107
#9 0x0000000000411b1c in run () at run.c:215 #10 0x0000000000410d8a in main
(argc=3, argv=0x7fff3fc25948) at main.c:190
(gdb) next
3466 for(i = 0; i < c; i++) {
(gdb) next
3467 endpoint = device->eager_rdma_buffers[i];
(gdb) next
3469 if(!endpoint)
(gdb) next
3472 OPAL_THREAD_LOCK(&endpoint->eager_rdma_local.lock);
(gdb) next
3473 frag = MCA_BTL_OPENIB_GET_LOCAL_RDMA_FRAG(endpoint,
(gdb) next
3476 if(MCA_BTL_OPENIB_RDMA_FRAG_LOCAL(frag)) {
(gdb) next
3510 OPAL_THREAD_UNLOCK(&endpoint->eager_rdma_local.lock);

--------------------------------------------------------------------

The OS is: Linux version 2.6.18-194.32.1.el5
(mockbuild_at_[hidden]) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-48))

The output from ibv_devinfo:

--------------------------------------------------------------------

hca_id: mlx4_0
        transport: InfiniBand (0)
        fw_ver: 2.5.000
        node_guid: 0018:8b90:97fe:2149
        sys_image_guid: 0018:8b90:97fe:214c
        vendor_id: 0x02c9
        vendor_part_id: 25418
        hw_ver: 0xA0
        board_id: DEL08C0000001
        phys_port_cnt: 2
                port: 1
                        state: PORT_ACTIVE (4)
                        max_mtu: 2048 (4)
                        active_mtu: 2048 (4)
                        sm_lid: 2
                        port_lid: 166
                        port_lmc: 0x00

                port: 2
                        state: PORT_DOWN (1)
                        max_mtu: 2048 (4)
                        active_mtu: 2048 (4)
                        sm_lid: 0
                        port_lid: 0
                        port_lmc: 0x00

--------------------------------------------------------------------

I am no MPI expert but just hopeful of any suggestions. Thanks!

Dylan Nelson