Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] deadlock when calling MPI_gatherv
From: Trent Creekmore (mtcreekmore_at_[hidden])
Date: 2010-04-26 21:07:39


You are going to have to debug and trace the program to find out where it is
stopping.
You may want to try using KDbg, a graphical front end for the command line
debugger dbg, which makes it a LOT easier, or use Eclipse.

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Teng Lin
Sent: Monday, April 26, 2010 6:49 PM
To: Open MPI Users
Subject: [OMPI users] deadlock when calling MPI_gatherv

Hi,

We recently ran into deadlock when calling MPI_gatherv with Open MPI 1.3.4.
It seems to have something to do with sm at first. However, it still hangs
even after turning off sm btl.

Any idea how to track down the problem?

Thanks,
Teng

#################################################
Stack trace for master node
#################################################
mca_btl_sm_component_progress
opal_progress
opal_condition_wait
ompi_request_default_wait_all
ompi_coll_tuned_sendrecv_actual
ompi_coll_tuned_barrier_intra_two_procs
ompi_coll_tuned_barrier_intra_dec_fixed
mca_coll_sync_gatherv
PMPI_Gatherv

#################################################
Stack trace for slave node
#################################################
mca_btl_sm_component_progress
opal_progress
opal_condition_wait
ompi_request_wait_completion
mca_pml_ob1_recv
mca_coll_basic_gatherv_intra
mca_coll_sync_gatherv

#################################################
Message queue from totalview
################################################
MPI_COMM_WORLD
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI_COMM_SELF
Comm_size 1
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI_COMM_NULL
Comm_size 0
Comm_rank -2
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI COMMUNICATOR 3 DUP FROM 0
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI COMMUNICATOR 4 SPLIT FROM 3
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI COMMUNICATOR 5 SPLIT FROM 4
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI COMMUNICATOR 6 SPLIT FROM 4
Comm_size 1
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI COMMUNICATOR 7 DUP FROM 4
Comm_size 2
Comm_rank 0
Pending receives
[0]
   Receive: 0x80b9000
   Data: 1 * MPI_CHAR
   Status Pending
   Source 0 (orterun<xxxx>.0)
   Tag 7 (0x00000007)
   User Buffer 0xb06fa010 -> 0x00000000 (0)
   Buffer Length 1359312 (0x0014bdd0)
[1]
   Receive: 0x80b9200
   Data: 1 * MPI_CHAR
   Status Pending
   Source 0 (orterun<xxxx>.0)
   Tag 5 (0x00000005)
   User Buffer 0xb0c2a010 -> 0x00000000 (0)
   Buffer Length 1359312 (0x0014bdd0)
[2]
   Receive: 0x80b9400
   Data: 1 * MPI_CHAR
   Status Pending
   Source 1 (orterun<xxxx>.1)
   Tag 3 (0x00000003)
   User Buffer 0xb115a010 -> 0xc0ef9e79 (-1058038151)
   Buffer Length 1359312 (0x0014bdd0)
[3]
   Receive: 0x80b9600
   Data: 1 * MPI_CHAR
   Status Pending
   Source 1 (orterun<xxxx>.1)
   Tag 1 (0x00000001)
   User Buffer 0xb168a010 -> 0xc0c662aa (-1060740438)
   Buffer Length 1359312 (0x0014bdd0)
[4]
   Receive: 0x82a2500
   Data: 1 * MPI_CHAR
   Status Pending
   Source 0 (orterun<xxxx>.0)
   Tag 11 (0x0000000b)
   User Buffer 0xafc9a010 -> 0x00000000 (0)
   Buffer Length 1359312 (0x0014bdd0)
[5]
   Receive: 0x82a2700
   Data: 1 * MPI_CHAR
   Status Pending
   Source 0 (orterun<xxxx>.0)
   Tag 9 (0x00000009)
   User Buffer 0xb01ca010 -> 0x00000000 (0)
   Buffer Length 1359312 (0x0014bdd0)

Unexpected messages : no information available Pending sends [0]
   Send: 0x80b8500
   Data transfer completed
   Status Complete
   Target 0 (orterun<xxxx>.0)
   Tag 4 (0x00000004)
   Buffer 0xb0846010 -> 0x40544279 (1079263865)
   Buffer Length 2548 (0x000009f4)
[1]
   Send: 0x80b8780
   Data transfer completed
   Status Complete
   Target 0 (orterun<xxxx>.0)
   Tag 6 (0x00000006)
   Buffer 0xb0d76010 -> 0x41a756bf (1101485759)
   Buffer Length 2992 (0x00000bb0)
[2]
   Send: 0x80b8a00
   Data transfer completed
   Status Complete
   Target 1 (orterun<xxxx>.1)
   Tag 0 (0x00000000)
   Buffer 0xb12a6010 -> 0xbf94cfc4 (-1080766524)
   Buffer Length 3856 (0x00000f10)
[3]
   Send: 0x80b8c80
   Data transfer completed
   Status Complete
   Target 1 (orterun<xxxx>.1)
   Tag 2 (0x00000002)
   Buffer 0xb17d6010 -> 0x400a1a6c (1074403948)
   Buffer Length 3952 (0x00000f70)
[4]
   Send: 0x831f080
   Data transfer completed
   Status Complete
   Target 0 (orterun<xxxx>.0)
   Tag 8 (0x00000008)
   Buffer 0xafde6010 -> 0xc0de2c50 (-1059181488)
   Buffer Length 3292 (0x00000cdc)
[5]
   Send: 0x831f300
   Data transfer completed
   Status Complete
   Target 0 (orterun<xxxx>.0)
   Tag 10 (0x0000000a)
   Buffer 0xb0316010 -> 0x41169ca7 (1092000935)
   Buffer Length 3232 (0x00000ca0)

MPI COMMUNICATOR 8 SPLIT FROM 5
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none

MPI COMMUNICATOR 9 SPLIT FROM 5
Comm_size 2
Comm_rank 0
Pending receives : none
Unexpected messages : no information available
Pending sends : none