Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Execution in multicore machines
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-09-30 12:55:26

Are these intel-based machines? I have seen similar effects mentioned
earlier in this thread where having all 8 cores banging on memory
pretty much kills performance on the UMA-style intel 8 core machines.
I'm not a hardware expert, but I've stayed away from buying 8-core
servers for exactly this reason. AMD's been NUMA all along, and
Intel's newer chips are NUMA to alleviate some of this bus pressure.

~2x performance loss (between 8 and 4 cores on a single node) seems a
bit excessive, but I guess it could happen...? (I don't have any hard
numbers either way)

On Sep 29, 2008, at 2:30 PM, Leonardo Fialho wrote:

> Hi All,
> I´m doing some probes in a multi core (8 cores per node) machine
> with NAS benchmarks. Something that I consider strange is occurring...
> I´m using only one NIC and paffinity:
> ./bin/mpirun
> -n 8
> --hostfile ./hostfile
> --mca mpi_paffinity_alone 1
> --mca btl_tcp_if_include eth1
> --loadbalance
> ./codes/nas/NPB3.3/NPB3.3-MPI/bin/lu.C.8
> I have sufficient memory to run this application in only one node,
> but:
> 1) If I use one node (8 cores) the "user" % is around 100% per core.
> The execution time is around 430 seconds.
> 2) If I use 2 nodes (4 cores in each node) the "user" % is around
> 95% per core and the "sys" % is 5%. The execution time is around 220
> seconds.
> 3) If I use 4 nodes (1 cores in each node) the "user" % is around
> %85 per core and the "sys" % is 15%. The execution time is around
> 200 seconds.
> Well... the questions are:
> A) The execution time in case "1" should be smaller (only sm
> communication, no?) than case "2" and "3", no? Cache problems?
> B) Why the "sys" time while using communication inter nodes? NIC
> driver? Why this time increase when I balance the load across the
> nodes?
> Thanks,
> --
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems