Are these intel-based machines? I have seen similar effects mentioned
earlier in this thread where having all 8 cores banging on memory
pretty much kills performance on the UMA-style intel 8 core machines.
I'm not a hardware expert, but I've stayed away from buying 8-core
servers for exactly this reason. AMD's been NUMA all along, and
Intel's newer chips are NUMA to alleviate some of this bus pressure.
~2x performance loss (between 8 and 4 cores on a single node) seems a
bit excessive, but I guess it could happen...? (I don't have any hard
numbers either way)
On Sep 29, 2008, at 2:30 PM, Leonardo Fialho wrote:
> Hi All,
> I´m doing some probes in a multi core (8 cores per node) machine
> with NAS benchmarks. Something that I consider strange is occurring...
> I´m using only one NIC and paffinity:
> -n 8
> --hostfile ./hostfile
> --mca mpi_paffinity_alone 1
> --mca btl_tcp_if_include eth1
> I have sufficient memory to run this application in only one node,
> 1) If I use one node (8 cores) the "user" % is around 100% per core.
> The execution time is around 430 seconds.
> 2) If I use 2 nodes (4 cores in each node) the "user" % is around
> 95% per core and the "sys" % is 5%. The execution time is around 220
> 3) If I use 4 nodes (1 cores in each node) the "user" % is around
> %85 per core and the "sys" % is 15%. The execution time is around
> 200 seconds.
> Well... the questions are:
> A) The execution time in case "1" should be smaller (only sm
> communication, no?) than case "2" and "3", no? Cache problems?
> B) Why the "sys" time while using communication inter nodes? NIC
> driver? Why this time increase when I balance the load across the
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
> users mailing list