Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-11-13 15:51:30


On Nov 3, 2005, at 8:35 PM, Allan Menezes wrote:

> 1. No, I have 4 NICs on the head node and two on each of the 15 other
> compute nodes. I use the realtek 8169 gigabit ethernet cards on the
> compute nodes as eth1 or eth0(one only) connected to a gigabit
> ethernet switch with bisection bandwidth of 32Gbps and a sk98lin
> driver 3Com built in gigabit ethernet NIC card on the head node(eth3).
> The other ethernet cards 10/100M on the head node handle a network
> laser printer(eth0) and eth2 (10/100M) internet access. Eth1 is a
> spare 10/100M which I can remove. The compute nodes each have two
> ethernet cards one 10/100Mbps ethernet not connected to anything(built
> in to M/B) and a PCI realtek 8169 gigabit ethernet connected to the
> TCP network LAN(Gigabit). When I tried it without the switches -mca
> pml teg the maximum performace I would get with it was 9GFlops for P=4
> Q=4 N=approx 12- 16 thousand and NB ridiculously low at 10 block size.
> If I tried bigger block sizes it would run for along time for large N
> ~ 16,000 unless I killed xhpl. I use atlas BLAS 3.7.11 libs compiled
> for each node and linked to HPL when creating xhpl. I also use open
> mpi mpicc in the hpl make file for compile and link both. Maybe I
> should according to the new faq use the TCP switch to use eth3 on the
> head node?

So if I'm reading that right, there's only one network that connects
the head node and the compute nodes, right?

> 2. I have 512MB of memory per node which is 8 GB total, so I can
> safely go upto N=22,000 24,000. I used sizes of 22000 for TCP teg and
> did not run into problems. But if I do not specify the switches
> suggested by Tim I get bad performance for N = 12000.

I must admit that I'm still befuddled by this -- we are absolutely
unable to duplicate this behavior. It *sounds* like there is some
network mismatching going on in here -- that the tcp btl is somehow
routing information differently than the tcp ptl (and therefore taking
longer -- timing out and the like).

We did make some improvements to the tcp subnet mask matching code for
rc5; I had to ask again, but could you try with the latest nightly
snapshot tarball?

        http://www.open-mpi.org/nightly/v1.0/

> 4. My cluster is an experimental Basement Cluster [BSquared = Brampton
> Beowulf] built out of x86 Durons(6), 2 athlons, 2 semprons, two P4s, 2
> 64 bit x86_64 AMD64 ATHLONS and two AMD x86_64 SEmprons(754 pin) for a
> total of 16 machines running FC3 and Oscar beta cluster software. I
> have not tried it with the latest open mpi snapshot yet but I will to
> night. I think I should reinstall FC3 on the head node P4 2.8GHz and
> reinstall all the compute nodes with Oscar beta Nov 3, 2005 and open
> mpi of todays Nov 3, 2005 1.0 snapshot and try again. I could have
> made an errror somewhere before. It should not take me long. But I
> doubt it as MPICH2 and open mpi with the switches pml teg give good
> comparable performance. I was not using jumo MTU frames either just
> 1500bytes. It is not homogenous (BSquared) but a good test set up.
> If you have any advice, Please tell me and I could try it out.
> Thank you and good luck!
> Allan
>
>
>
>
>
> On Oct 27, 2005, at 10:19 AM, Jeff Squyres wrote:
>
>
>> > On Oct 19, 2005, at 12:04 AM, Allan Menezes wrote:
>> >
>> >
>>
>>> >> We've done linpack runs recently w/ Infiniband, which result in
>>> >> performance
>>> >> comparable to mvapich, but not w/ the tcp port. Can you try
>>> >> running w/
>>> >> an
>>> >> earlier version, specify on the command line:
>>> >>
>>> >> -mca pml teg
>>> >> Hi Tim,
>>> >> I tried the same cluster (16 node x86) with the switches -mca
>>> pml
>>> >> teg and I get good performance of 24.52Gflops at N=22500
>>> >> and Block size NB=120.
>>> >> My command line now looks like :
>>> >> a1> mpirun -mca pls_rsh_orted /home/allan/openmpi/bin/orted -mca
>>> pml
>>> >> teg -hostile aa -np 16 ./xhpl
>>> >> hostfile = aa, containing the addresses of the 16 machines.
>>> >> I am using a GS116 16 port netgear Gigabit ethernet switch with
>>> Gnet
>>> >> realtek gig ethernet cards
>>> >> Why, PLEASE, do these switches pml teg make such a difference?
>>> It's
>>> >> 2.6 times more performance in GFlops than what I was getting
>>> without
>>> >> them.
>>> >> I tried version rc3 and not an earlier version.
>>> >> Thank you very much for your assistance!
>>> >>
>>>
>> >
>> > Sorry for the delay in replying to this...
>> >
>> > The "pml teg" switch tells Open MPI to use the 2nd generation TCP
>> > implementation rather than the 3rd generation TCP. More
>> specifically,
>> > the "PML" is the point-to-point management layer. There are 2
>> > different components for this -- teg (2nd generation) and ob1 (3rd
>> > generation). "ob1" is the default; specifying "--mca pml teg" tells
>> > Open MPI to use the "teg" component instead of ob1.
>> >
>> > Note, however, that teg and ob1 know nothing about TCP -- it's the
>> 2nd
>> > order implications that make the difference here. teg and ob1 use
>> > different back-end components to talk across networks:
>> >
>> > - teg uses PTL components (point-to-point transport layer -- 2nd
>> gen)
>> > - ob1 uses BTL components (byte transfer layer -- 3rd gen)
>> >
>> > We obviously have TCP implementations for both the PTL and BTL.
>> > Considerable time was spent optimizing the TCP PTL (i.e., 2nd gen).
>> > Unfortunately, as yet, little time has been spent optimizing the TCP
>> > BTL (i.e., 3rd gen) -- it was a simple port, nothing more.
>> >
>> > We have spent the majority of our time, so far, optimizing the
>> Myrinet
>> > and Infiniband BTLs (therefore showing that excellent performance is
>> > achievable in the BTLs). However, I'm quite disappointed by the TCP
>> > BTL performance -- it sounds like we have a protocol mismatch that
>> is
>> > arbitrarily slowing everything down, and something that needs to be
>> > fixed before 1.0 (it's not a problem with the BTL design, since IB
>> and
>> > Myrinet performance is quite good -- just a problem/bug in the TCP
>> > implementation of the BTL). That much performance degradation is
>> > clearly unacceptable.
>> >
>> > --
>> > {+} Jeff Squyres
>> > {+} The Open MPI Project
>> > {+} http://www.open-mpi.org/
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>>
>
>
>
> --
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/