Is your interconnect Gigabytes Ethernet? It's very surprised to see TCP BTL just got 33MBytes peak BW on your cluster. I did a similar test on an amd cluster with gigabytes Ethernet. As following shows, the TCP BTL's BW is similar with your tipc(112MBytes/s). Could you redo the test with 2 processes spawned, 2 nodes in your machinefile and enabling --bynode?
It looks like your tipc BTL is pretty good at message size between 8K and 512K. Can you tell us more about difference between TIPC and TCP protocol stacks? Any special configure needed to enable your tipc? Maybe you can write a module in Netpipe( similar to NPTcp )to test raw performance on both TCP and TIPC without MPI.
TCP BTL on a Gigbytes ethernet
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 23.27 0.00
1 1000 23.78 0.04
2 1000 23.77 0.08
4 1000 25.47 0.15
8 1000 23.94 0.32
16 1000 24.36 0.63
32 1000 24.83 1.23
64 1000 25.76 2.37
128 1000 27.25 4.48
256 1000 30.66 7.96
512 1000 36.86 13.25
1024 1000 49.00 19.93
2048 1000 77.83 25.10
4096 1000 82.42 47.39
8192 1000 165.28 47.27
16384 1000 325.01 48.08
32768 1000 440.75 70.90
65536 640 1060.00 58.96
131072 320 1674.71 74.64
262144 160 2814.13 88.84
524288 80 4975.11 100.50
1048576 40 9526.94 104.97
2097152 20 18419.33 108.58
4194304 10 36150.05 110.65
8388608 5 71880.79 111.30
Teng
It's "letter113"
On 08/25/2011 03:14 PM, Jeff Squyres wrote:
On Aug 25, 2011, at 8:25 AM, Xin He wrote:
Sure -- what's your bitbucket account ID?Can you edit your configure.m4 directly and test it and whatnot? I provided the configure.m4 as a starting point for you. :-) It shouldn't be hard to make it check linux/tipc.h instead of tipc.h. I'm happy to give you direct write access to the bitbucket, if you want it.I think me having write access is convenient for both of us :)
It is easy to have TIPC support. It is within the kernel actually. To get TIPC working, you only have to configure it by using "tipc-config". Maybe you
No worries. Lawyers tend to take time when reviewing this stuff; we've seen this pattern in most organizations who sign the OMPI agreement.As we've discussed off-list, we can't take the code upstream until the contributor agreement is signed, unfortunately.The agreement thing is ongoing right now, but it may take some time.
But to save time, can you guys do some test on TIPC BTL, so thatI don't know if any of us has the TIPC support libraries installed.
when the agreement is ready, the code can be used?
can check this doc for information: http://tipc.sourceforge.net/doc/Users_Guide.txtSure. Search "TIPC: Providing Communication for Linux Clusters". It is a paper written by the author of TIPC, explaining basic stuff about TIPC,
So... what *is* TIPC? Is there a writeup anywhere that we can read about what it is / how it works? For example, what makes TIPC perform better than TCP?
should be very useful. And you can visit TIPC homepage: http://tipc.sourceforge.net/ .Hi, I think these models are reasonably new :)
I'm not familiar with the Dell or Opteron lines -- how recent are those models?Yes, please check the appendix for the results using IMB 3.2.I have done some tests using tools like NetPIPE, osu and IMB and the result shows that TIPC BTL has a better performanceGreat! Can you share any results?
than TCP BTL.
I have done the tests on 2 computers. Dell SC1435
Dual-Core AMD Opteron(tm) Processor 2212 HE x 2
4 GB Mem
Ubuntu Server 10.04 LTS 32-bit Linux 2.6.32-24
I ask because your TCP latency is a bit high (about 85us in 2-process IMB PingPong); it might suggest older hardware. Or you may have built a debugging version of Open MPI (if you have a .svn or .hg checkout, that's the default). See the HACKING top-level file for how to get an optimized build.
For example, with my debug build of Open MPI on fairly old Xeons with 1GB ethernet, I'm getting the following PingPong results (note: this is a debug build; it's not even an optimized build):
-----
$ mpirun --mca btl tcp,self --bynode -np 2 --mca btl_tcp_if_include eth0 hostname
svbu-mpi008
svbu-mpi009
$ mpirun --mca btl tcp,self --bynode -np 2 --mca btl_tcp_if_include eth0 IMB-MPI1 PingPong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2, MPI-1 part
#---------------------------------------------------
...
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 57.31 0.00
1 1000 57.71 0.02
2 1000 57.73 0.03
4 1000 57.81 0.07
8 1000 57.78 0.13
-----
With an optimized build, it shaves off only a few us (which isn't too important in this case, but it does matter in the low-latency transport cases):
-----
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 54.62 0.00
1 1000 54.92 0.02
2 1000 55.15 0.03
4 1000 55.16 0.07
8 1000 55.15 0.14
-----
The result I gave you, they are tested on 2 processes but on 2 different servers. I get that the result you showed is 2 processes on one machine?
But I did build with debug enabled, I will try optimize then :)
BTW, I forgot to tell you about SM & TIPC. Unfortunately, TIPC does not beat SM...
/Xin
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel