Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tony Ladd (ladd_at_[hidden])
Date: 2006-10-24 13:58:14


Durga

I guess we have strayed a bit from the original post. My personal opinion is
that a number of codes can run in HPC-like mode over Gigabit ethernet, not
just the trivially parallelizable. The hardware components are one key;
PCI-X, low hardware latency NIC (Intel PRO 1000 is 6.6 microsecs vs about 14
for the Bcom 5721), and a non-blocking (that's the key word) switch. Then
you need a good driver and a good MPI software layer. At present MPICH is
ahead of LAM/OpenMPI/MVAPICH in its implementation of optimized collectives.
At least that's how it seems to me (let me say that quickly, before I get
flamed). MPICH got a bad rap performance wise because its TCP driver was
mediocre (compared with LAM and OpenMPI). But MPICH + GAMMA is very fast.
MPIGAMMA even beats out our Infiniband cluster running OpenMPI on the
MPI_Allreduce; the test was with 64 cpus-32 nodes on the GAMMA cluster (Dual
core P4) and 16 nodes on the Infiniband (Dual Dual-core Opterons). The IB
cluster worked out at 24MBytes/sec (vector size/time) and the GigE +
MPIGAMMA was 39MBytes/sec. On the other hand, if I use my own optimized
AllReduce (a simplified version of the one in MPICH) on the IB cluster it
gets 108MByte/sec. So the tricky thing is all the components need to be in
place to get good application performance.

GAMMA is not so easy to set up-I had considerable help from Giuseppe. It has
libraries to compile and the kernel needs to be recompiled. Once I got that
automated I can build and install a new version of GAMMA in about 5 mins.
The MPIGAMMA build is just like MPICH and MPIGAMMA works almost exactly the
same. So any application that will compile under MPICH should compile under
MPIGAMMA, just by changing the path. I have run half a dozen apps with
GAMMA. Netpipe, Netbench (my network tester-a simplified version of IMB),
Susp3D (my own code-a CFD like application), DLPOLY all compile out of the
box. Gromacs compiles but has a couple of "bugs" that crash on execution.
One is an archaic test for MPICH that prevents a clean exit-must have been a
bugfix for an earlier version of MPICH. The other seems to be an fclose of
an unassigned file pointer. It works OK in LAM but my guess is its illegal
strictly speaking. A student was supposed to check on that. VASP also
compiles out of the box if you can compile it with MPICH. But there is a
problem with the MPIGAMMA and the MPI_Alltoall function right now. It works
but it suffers from hangups and long delays. So GAMMA is not good for VASP
at this moment. You see the substantial performance improvements sometimes,
but other times its dreadfully slow. I can reproduce the problem with an
AlltoAll test code and Giuseppe is going to try to debug the problem.

So GAMMA is not a pancea. In most circumstances it is stable and
predictable; much more reproducble than MPI over TCP. But there are still
may be one or two bugs and several issues.
1) Since GAMMA is tightly entwined in the kernel a crash frequently brings
the whole system down, which is a bit annoying; also it can crash other
nodes in the same GAMMA Virtual Machine.
2) NIC's are very buggy hardware-if you look at a TCP driver there are a
large number of hardware bugfixes in them. A number of GAMMA problems can be
traced to this. It's a lot of work to reprogram all the workarounds.
3) GAMMA nodes have to be preconfigured at boot. You can run more than one
job on a GAMMA virtual machine, but it's a little iffy; there can be
interactions between nodes on the same VM even if they are running different
jobs. Different GAMMA VM's need a different VLAN. So a multiuser environment
is still problematic.
4) Giuseppe said MPIGAMMA was a very difficult code to write-so I would
guess a port to OpenMPI would not be trivial. Also I would want to see
optimized collectives in OpenMPI before I switched from MPICH

As far as I know GAMMA is the most advanced non TCP protocol. At core it
really works well, but it still needs a lot more testing and development.
Giuseppe is great to work with if anyone out there is interested. Go to the
MPIGAMMA website for more info
http://www.disi.unige.it/project/gamma/mpigamma/index.html.

Tony