Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] vtrun/otf question
From: Jaroslaw Slawinski (jaross_at_[hidden])
Date: 2012-12-01 02:56:52


Hello everybody, this is my first post.

I needed to analyze the communication among nodes in a CFD code, so I
used vtrun from mpiexec.
Next, I dumped the data (otfdump) and summed up the messages volumes
for Send and Rec. lines
My results astonished me - the total Sent <> total Received.
Below I present a very small, 4 processes problem but it occurs in
every run for any number of processes:
This is the sum for SendMessage - first column is sender, second is
rec, 3rd the volume in bytes.

0 0 0
0 1 33575534
0 2 17178610
0 3 17881624
1 0 75900050
1 1 0
1 2 9510508
1 3 20961830
2 0 39807134
2 1 9937288
2 2 0
2 3 30328578
3 0 32415748
3 1 33226154
3 2 55062442
3 3 0

For ReceiveMessage - first column is rec, second sender, 3rd the volume:

0 0 0
0 1 57682570
0 2 30912474
0 3 28154684
1 0 43260014
1 1 0
1 2 9937288
1 3 37073342
2 0 21455674
2 1 9510508
2 2 0
2 3 62425238
3 0 20559492
3 1 19374170
3 2 27494694
3 3 0

Comparing, you can see that reported volumes are perfect between ranks
1 and 2 both directions only. But for others?

I correlated the data with Vampir for this 4-proc case and it shows
agg. message volume partially from SendMessages, partially from
ReciveMessages. Below the table, data in MiB, in brackets you have
ident. or the Send or Rec part I got from OTF.

     p0 p1 p2 p3
p0 32.02(S) 16.383(S) 17.053(S)
p1 55.01(R) 9.07(R/S) 18.477(R)
p2 29.48(R) 9.477(R/S) 26.221(R)
p3 26.85(R) 31.687(S) 52.512(S)

Can anybody explain this, please? Probably I do something wrong or I
do not understand how to interpret the data in otf. Can otfdump work
wrong? Or Vampir?

Best regards
jaross