Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] kernel 2.6.23 vs 2.6.24 - communication/wait times
From: Oliver Geisler (openmpi_at_[hidden])
Date: 2010-04-09 16:59:34


Sorry for replying late. Unfortunately I am not "full time
administrator". And I am going to be a conference next week, so please
be patient with me replying.

On 4/7/2010 6:56 PM, Eugene Loh wrote:
> Oliver Geisler wrote:
>
>> Using netpipe and comparing tcp and mpi communication I get the
>> following results:
>>
>> TCP is much faster than MPI, approx. by factor 12
>>
>>
> Faster? 12x? I don't understand the following:
>
>> e.g a packet size of 4096 bytes deliveres in
>> 97.11 usec with NPtcp and
>> 15338.98 usec with NPmpi
>>
>>
> This implies NPtcp is 160x faster than NPmpi.
>

The ratio function NPtcp/NPmpi has a mean value of factor 60 for small
packet sizes <4kB, a maximum of 160 at 4kB (it was a bad value to pick
out in the first place), then dropping down to 40 for packet sizes of
about 16kB and further dropping below factor 20 for packets larger than
100kB.

>> or
>> packet size 262kb
>> 0.05268801 sec NPtcp
>> 0.00254560 sec NPmpi
>>
>>
> This implies NPtcp is 20x slower than NPmpi.
>

Sorry, my fault ... vice versa, should read:
packet size 262kb
0.00254560 sec NPtcp
0.05268801 sec NPmpi

>> Further our benchmark started with "--mca btl tcp,self" runs with short
>> communication times, even using kernel 2.6.33.1
>>
>> Is there a way to see what type of communication is actually selected?
>>
>> Can anybody imagine why shared memory leads to these problems?
>>
>>
> Okay, so it's a shared-memory performance problem since:
>
> 1) You get better performance when you exclude sm explicitly with "--mca
> btl tcp,self".
> 2) You get better performance when you exclude sm by distributing one
> process per node (an observation you made relatively early in this thread).
> 3) TCP is faster than MPI (which is presumably using sm).
>
> Can you run a pingpong test as a function of message length for two
> processes in a way that demonstrates the problem? For example, if
> you're comfortable with SKaMPI, just look at Pingpong_Send_Recv and
> let's see what performance looks like as a function of message length.
> Maybe this is a short-message-latency problem.

This is the results of skampi pt2pt, first with shared memory allowed,
second shared memory excluded.
It doesn't look to me as the long message times are related to short
messages.
Including hosts over ethernet results in higher communication times
which are equal to those when I ping the host (a hundred+ milliseconds).

mpirun --mca btl self,sm,tcp -np 2 ./skampi -i ski/skampi_pt2pt.ski

# begin result "Pingpong_Send_Recv"
count= 1 4 12756.0 307.4 16 11555.3 11011.2
count= 2 8 9902.8 629.0 16 9615.4 8601.0
count= 3 12 12547.5 881.0 16 12233.1 11229.2
count= 4 16 12087.2 829.6 16 11610.6 10478.6
count= 6 24 13634.4 352.1 16 11247.8 12621.9
count= 8 32 13835.8 282.2 16 11091.7 12944.6
count= 11 44 13328.9 864.6 16 12095.6 11977.0
count= 16 64 13195.2 432.3 16 11460.4 10051.9
count= 23 92 13849.3 532.5 16 12476.9 12998.1
count= 32 128 14202.2 436.4 16 11923.8 12977.4
count= 45 180 14026.3 637.7 16 13042.5 12767.8
count= 64 256 13475.8 466.7 16 11720.4 12521.3
count= 91 364 14015.0 406.1 16 13300.4 12881.6
count= 128 512 13481.3 870.6 16 11187.7 12070.6
count= 181 724 10697.1 98.4 16 10697.1 9520.1
count= 256 1024 14120.8 602.1 16 13988.2 11349.9
count= 362 1448 15718.2 582.3 16 14468.2 12535.2
count= 512 2048 11214.9 749.1 16 11155.0 9928.5
count= 724 2896 15127.3 186.1 16 15127.3 10974.9
count= 1024 4096 34045.0 692.2 16 32963.6 31728.1
count= 1448 5792 29965.9 788.1 16 27997.8 27404.4
count= 2048 8192 30082.1 785.3 16 28023.9 29538.5
count= 2896 11584 32556.0 219.4 16 29312.2 32290.4
count= 4096 16384 24999.8 839.6 16 23422.0 23644.6
# end result "Pingpong_Send_Recv"
# duration = 10.15 sec

mpirun --mca btl tcp,self -np 2 ./skampi -i ski/skampi_pt2pt.ski

# begin result "Pingpong_Send_Recv"
count= 1 4 14.5 0.3 16 13.5 13.2
count= 2 8 13.5 0.2 8 12.9 12.4
count= 3 12 13.1 0.4 16 12.7 11.3
count= 4 16 13.9 0.4 16 12.7 13.0
count= 6 24 13.8 0.4 16 12.5 12.8
count= 8 32 13.8 0.4 16 12.7 13.0
count= 11 44 14.0 0.3 16 12.8 13.0
count= 16 64 13.5 0.5 16 12.3 12.4
count= 23 92 13.9 0.4 16 13.1 12.7
count= 32 128 14.8 0.1 16 13.1 14.5
count= 45 180 14.2 0.4 8 13.1 12.9
count= 64 256 15.1 0.2 16 13.3 14.8
count= 91 364 16.5 0.3 16 14.1 16.1
count= 128 512 12.8 0.2 8 11.5 12.5
count= 181 724 13.4 0.3 16 11.5 13.3
count= 256 1024 14.0 0.3 16 11.7 14.0
count= 362 1448 13.2 0.3 16 12.2 12.5
count= 512 2048 15.4 0.2 16 12.5 15.4
count= 724 2896 15.7 0.2 16 13.1 15.7
count= 1024 4096 17.0 0.1 8 13.5 17.0
count= 1448 5792 18.5 0.2 16 15.5 18.5
count= 2048 8192 20.4 0.2 16 17.1 20.4
count= 2896 11584 24.1 0.1 16 21.0 24.0
count= 4096 16384 32.0 0.0 16 27.2 32.0
# end result "Pingpong_Send_Recv"
# duration = 0.01 sec

 mpirun --mca btl tcp,self -np 2 -host cluster-13,cluster-16 ./skampi -i
ski/skampi_pt2pt.ski

# begin result "Pingpong_Send_Recv"
count= 1 4 133.1 0.4 16 133.1 84.8
count= 2 8 132.7 0.1 16 132.7 85.0
count= 3 12 133.2 0.3 8 133.2 85.2
count= 4 16 133.8 0.2 8 133.8 85.5
count= 6 24 134.0 0.0 8 134.0 85.5
count= 8 32 134.2 0.2 16 134.2 86.8
count= 11 44 134.0 0.0 8 134.0 86.2
count= 16 64 135.2 0.2 8 135.2 87.0
count= 23 92 136.3 0.1 16 136.3 88.5
count= 32 128 137.6 0.2 16 137.6 90.3
count= 45 180 139.0 0.0 8 139.0 91.2
count= 64 256 138.8 2.2 8 130.0 104.6
count= 91 364 143.9 0.1 16 143.9 96.2
count= 128 512 148.5 0.3 8 148.5 101.8
count= 181 724 157.3 0.2 16 157.3 111.0
count= 256 1024 169.8 0.2 8 169.8 123.8
count= 362 1448 163.4 0.3 8 161.0 163.4
count= 512 2048 207.2 0.2 8 207.2 163.5
count= 724 2896 235.5 1.7 8 235.5 190.0
count= 1024 4096 233.0 0.6 8 230.7 233.0
count= 1448 5792 314.2 3.3 16 314.2 264.9
count= 2048 8192 343.0 3.9 8 343.0 295.0
count= 2896 11584 540.0 11.2 16 539.9 456.8
count= 4096 16384 636.3 13.2 16 636.3 473.1
# end result "Pingpong_Send_Recv"
# duration = 0.07 sec

> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.