Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Troy Telford (ttelford_at_[hidden])
Date: 2006-06-02 11:15:06

On Thu, 01 Jun 2006 18:07:07 -0600, Jeff Squyres (jsquyres)
<jsquyres_at_[hidden]> wrote:

> This *sounds* like the classic oversubscription problem: Open MPI's
> aggressive vs. degraded operating modes:

Good link; bookmarked for (internal) documentation...

> Specifically, "slots" is *not* meant to be the number of processes to
> run. It's meant to be how many processors are available to run. Hence,
> if you lie and tell OMPI that you have more slots than CPUs, OMPI will
> think that it can run in aggressive mode. But you'll have less
> processors than processes, and all of them will be running in aggressive
> mode -- hence, massive slowdown.
> However, you say that you've got 2 dual core opterons in a single box,
> so there should be 4 processors. Hence "slots=4" should not be a lie.

It's good to hear that my concept of slots wasn't off. (Although my
message didn't give that impression...) It certainly seems to me that
with two dual cores I should use slots=4.

> I can't think of why this would happen.

> Can you confirm that your Linux installation thinks that it has 4
> processors and will schedule 4 processes simultaneously?

Fun story: At first, *I* thought it was a simple case of two single-core
processors. (slots=2, and I used two nodes to get 4 CPUs) I believed it
had only two processors because `cat /proc/cpuinfo` would list two
processors: CPU0 and CPU1. (ie. the Linux installation doesn't see four
processors, but two dual-core processors.)

Then somebody pointed out to me they were dual core, and that cpuinfo
listed it:
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : unknown
stepping : 2
cpu MHz : 2613.419
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2 <----- Two cores -------
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm
3dnowext 3dnow pni lahf_lm
bogomips : 5227.16
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
To verify that it acted like it had four cores, I tried the following:
(using two nodes in the machinefile, each with slots=2)
1.) Start a 4 CPU linpack job. (Supposedly using half of the CPU power
in each machine)
      * With just 4 processes in total, the problem size took approximately
0.08 s to finish (repeatably; the HPL.dat is set to run several of the
same problem size.)
      * 'top' listed *two* CPU's, both pegged at 100%. Each hpl process
was taking 100% of the CPU.
2.) Start a second 4 CPU linpack job (using the other half of the CPU
      * When I started the second job (8 total processes, 4 in each job),
the same problem size started to take 0.19 s to complete (on both jobs)
      * 'top' listed *two* CPU's, both pegged at 100%. Each hpl process
was taking 50% of the CPU.
Then, I tried the same 4 process linpack job on a single node (one node in
the machinefile, slots=2)
The results were essentially identical to #2 above (where the node was
still running 4 processes)

So it seems that although the system has dual-core CPU's, only one core is
being used per CPU; so four simultaneous processes are not being scheduled.

So the oversubscription hypothesis appears to be 100% correct; slots=4 is
oversubscribing the job.

Now I get to go find out *why* the job is oversubscribed, since there are
4 cores able to handle the process... I'll have to see if the system
behaves similarly with non-mpi processes (ie. it doesn't use all of the
available cores). It may very well be a problem with the hardware or OS;
it's the pre-release distro I wrote about in another posting yesterday...

I'm wondering if there is something happening behind the scenes... I'll
have to check...

Troy Telford