How does the stack for the non-SM BTL run look, I assume it probably is the same?  Also, can you dump the message queues for rank 1?  What's interesting is you have a bunch of pending receives, do you expect that to be the case when the MPI_Gatherv occurred?

--td

Teng Lin wrote:
Hi,

We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4. It seems to have something to do with sm at first. However, it still hangs even after turning off sm btl.

Any idea how to track down the problem?

Thanks,
Teng

#################################################
Stack trace for master node
#################################################
mca_btl_sm_component_progress
opal_progress
opal_condition_wait
ompi_request_default_wait_all
ompi_coll_tuned_sendrecv_actual
ompi_coll_tuned_barrier_intra_two_procs
ompi_coll_tuned_barrier_intra_dec_fixed
mca_coll_sync_gatherv
PMPI_Gatherv


#################################################
Stack trace for slave node
#################################################
mca_btl_sm_component_progress
opal_progress
opal_condition_wait
ompi_request_wait_completion
mca_pml_ob1_recv
mca_coll_basic_gatherv_intra
mca_coll_sync_gatherv


#################################################
Message queue from totalview
################################################
MPI_COMM_WORLD
Comm_size                2
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI_COMM_SELF
Comm_size                1
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI_COMM_NULL
Comm_size                0
Comm_rank               -2
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI COMMUNICATOR 3 DUP FROM 0
Comm_size                2
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI COMMUNICATOR 4 SPLIT FROM 3
Comm_size                2
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI COMMUNICATOR 5 SPLIT FROM 4
Comm_size                2
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI COMMUNICATOR 6 SPLIT FROM 4
Comm_size                1
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI COMMUNICATOR 7 DUP FROM 4
Comm_size                2
Comm_rank                0
Pending receives   
[0]
   Receive: 0x80b9000
   Data: 1 * MPI_CHAR
   Status           Pending
   Source           0 (orterun<xxxx>.0)
   Tag              7 (0x00000007)
   User Buffer      0xb06fa010 -> 0x00000000 (0)
   Buffer Length    1359312 (0x0014bdd0)
[1]
   Receive: 0x80b9200
   Data: 1 * MPI_CHAR
   Status           Pending
   Source           0 (orterun<xxxx>.0)
   Tag              5 (0x00000005)
   User Buffer      0xb0c2a010 -> 0x00000000 (0)
   Buffer Length    1359312 (0x0014bdd0)
[2]
   Receive: 0x80b9400
   Data: 1 * MPI_CHAR
   Status           Pending
   Source           1 (orterun<xxxx>.1)
   Tag              3 (0x00000003)
   User Buffer      0xb115a010 -> 0xc0ef9e79 (-1058038151)
   Buffer Length    1359312 (0x0014bdd0)
[3]
   Receive: 0x80b9600
   Data: 1 * MPI_CHAR
   Status           Pending
   Source           1 (orterun<xxxx>.1)
   Tag              1 (0x00000001)
   User Buffer      0xb168a010 -> 0xc0c662aa (-1060740438)
   Buffer Length    1359312 (0x0014bdd0)
[4]
   Receive: 0x82a2500
   Data: 1 * MPI_CHAR
   Status           Pending
   Source           0 (orterun<xxxx>.0)
   Tag              11 (0x0000000b)
   User Buffer      0xafc9a010 -> 0x00000000 (0)
   Buffer Length    1359312 (0x0014bdd0)
[5]
   Receive: 0x82a2700
   Data: 1 * MPI_CHAR
   Status           Pending
   Source           0 (orterun<xxxx>.0)
   Tag              9 (0x00000009)
   User Buffer      0xb01ca010 -> 0x00000000 (0)
   Buffer Length    1359312 (0x0014bdd0)

Unexpected messages : no information available
Pending sends
[0]
   Send: 0x80b8500
   Data transfer completed
   Status           Complete
   Target           0 (orterun<xxxx>.0)
   Tag              4 (0x00000004)
   Buffer           0xb0846010 -> 0x40544279 (1079263865)
   Buffer Length    2548 (0x000009f4)
[1]
   Send: 0x80b8780
   Data transfer completed
   Status           Complete
   Target           0 (orterun<xxxx>.0)
   Tag              6 (0x00000006)
   Buffer           0xb0d76010 -> 0x41a756bf (1101485759)
   Buffer Length    2992 (0x00000bb0)
[2]
   Send: 0x80b8a00
   Data transfer completed
   Status           Complete
   Target           1 (orterun<xxxx>.1)
   Tag              0 (0x00000000)
   Buffer           0xb12a6010 -> 0xbf94cfc4 (-1080766524)
   Buffer Length    3856 (0x00000f10)
[3]
   Send: 0x80b8c80
   Data transfer completed
   Status           Complete
   Target           1 (orterun<xxxx>.1)
   Tag              2 (0x00000002)
   Buffer           0xb17d6010 -> 0x400a1a6c (1074403948)
   Buffer Length    3952 (0x00000f70)
[4]
   Send: 0x831f080
   Data transfer completed
   Status           Complete
   Target           0 (orterun<xxxx>.0)
   Tag              8 (0x00000008)
   Buffer           0xafde6010 -> 0xc0de2c50 (-1059181488)
   Buffer Length    3292 (0x00000cdc)
[5]
   Send: 0x831f300
   Data transfer completed
   Status           Complete
   Target           0 (orterun<xxxx>.0)
   Tag              10 (0x0000000a)
   Buffer           0xb0316010 -> 0x41169ca7 (1092000935)
   Buffer Length    3232 (0x00000ca0)

MPI COMMUNICATOR 8 SPLIT FROM 5
Comm_size                2
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none

MPI COMMUNICATOR 9 SPLIT FROM 5
Comm_size                2
Comm_rank                0
Pending receives    : none
Unexpected messages : no information available
Pending sends       : none


  

_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com