Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Mark Kosmowski (mark.kosmowski_at_[hidden])
Date: 2007-02-19 13:53:10


Dear OMPI Community:

I have a modest personal cluster (3 node, 6 processor Opterons - all
single core, two are 242's and 4 are 844's - each machine has 4 Gb of
RAM) over gigabit (unmanaged switch) that I put together to run
computational chemistry projects for my doctoral studies. I'm using
the 844's as dual processors because I got a good deal on the lot of
the 4 844 chips.

The 844 based systems are on a Arima / Rioworks HDAMA motherboard -
the RAM is configured as 2 @ 2 Gb sticks in cpu 0 DIMM 0 and 1
locations (to use a consistant numbering scheme - the motherboard
manual calls them cpu 0 and 1, but then DIMM 1 - 4 for each cpu -
going by this the DIMMs are in slots 1 and 2 of the cpu 0 bank). The
242 based system is on a Tyan 2875 motherboard configured as 1 Gb
stick in each of the four slots of the one bank of DIMM slots.

I am running OpenSUSE 10.2 on each system.

I did some benchmarking of the same executable running the same job on
just the 242 system (using both processors) versus the entire cluster.
 The program (CPMD, www.cpmd.org) reports cpu time and elapsed time.

I'm reporting the times below in hours:minutes, rounding to the
nearest minute. I trust that everyone will agree that it is
insignificant if I inadvertently truncated instead of rounded some of
the minutes.

For just the one system with two processors:

CPU time: 32:43
Elapsed time: 36:52
Peak memory: 373 Mb

For just the cluster:

CPU time: 12:23
Elapsed time: 20:30
Peak memory: 131 Mb

Is this a typical scaling or should I be thinking about doing some
sort of tweaking to the [network / ompi] system at some point? The
cpu time is scaling about right, but elapsed time is getting hammered
- with the low memory overhead it has to be a communications issue
rather than a swap issue, right?

Would it be helpful to see a serial time point using the same
executable (if so, I'd probably repeat all the runs with a smaller job
- I don't know that I want to spend half a week just for
benchmarking)?

I have included the appropriate btl_tcp_if_include configuration so
that OMPI only uses the gigabit ports (and not the internet
connections that some of the machines have).

I am already planning on doing some benchmark comparisons to determine
the effect of compiler / math library on speed.

Thank you,

Mark Kosmowski