Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-10-27 10:19:35

On Oct 19, 2005, at 12:04 AM, Allan Menezes wrote:

> We've done linpack runs recently w/ Infiniband, which result in
> performance
> comparable to mvapich, but not w/ the tcp port. Can you try running w/
> an
> earlier version, specify on the command line:
> -mca pml teg
> Hi Tim,
> I tried the same cluster (16 node x86) with the switches -mca pml
> teg and I get good performance of 24.52Gflops at N=22500
> and Block size NB=120.
> My command line now looks like :
> a1> mpirun -mca pls_rsh_orted /home/allan/openmpi/bin/orted -mca pml
> teg -hostile aa -np 16 ./xhpl
> hostfile = aa, containing the addresses of the 16 machines.
> I am using a GS116 16 port netgear Gigabit ethernet switch with Gnet
> realtek gig ethernet cards
> Why, PLEASE, do these switches pml teg make such a difference? It's
> 2.6 times more performance in GFlops than what I was getting without
> them.
> I tried version rc3 and not an earlier version.
> Thank you very much for your assistance!

Sorry for the delay in replying to this...

The "pml teg" switch tells Open MPI to use the 2nd generation TCP
implementation rather than the 3rd generation TCP. More specifically,
the "PML" is the point-to-point management layer. There are 2
different components for this -- teg (2nd generation) and ob1 (3rd
generation). "ob1" is the default; specifying "--mca pml teg" tells
Open MPI to use the "teg" component instead of ob1.

Note, however, that teg and ob1 know nothing about TCP -- it's the 2nd
order implications that make the difference here. teg and ob1 use
different back-end components to talk across networks:

- teg uses PTL components (point-to-point transport layer -- 2nd gen)
- ob1 uses BTL components (byte transfer layer -- 3rd gen)

We obviously have TCP implementations for both the PTL and BTL.
Considerable time was spent optimizing the TCP PTL (i.e., 2nd gen).
Unfortunately, as yet, little time has been spent optimizing the TCP
BTL (i.e., 3rd gen) -- it was a simple port, nothing more.

We have spent the majority of our time, so far, optimizing the Myrinet
and Infiniband BTLs (therefore showing that excellent performance is
achievable in the BTLs). However, I'm quite disappointed by the TCP
BTL performance -- it sounds like we have a protocol mismatch that is
arbitrarily slowing everything down, and something that needs to be
fixed before 1.0 (it's not a problem with the BTL design, since IB and
Myrinet performance is quite good -- just a problem/bug in the TCP
implementation of the BTL). That much performance degradation is
clearly unacceptable.

{+} Jeff Squyres
{+} The Open MPI Project