This *sounds* like the classic oversubscription problem: Open MPI's
aggressive vs. degraded operating modes:
Specifically, "slots" is *not* meant to be the number of processes to
run. It's meant to be how many processors are available to run. Hence,
if you lie and tell OMPI that you have more slots than CPUs, OMPI will
think that it can run in aggressive mode. But you'll have less
processors than processes, and all of them will be running in aggressive
mode -- hence, massive slowdown.
However, you say that you've got 2 dual core opterons in a single box,
so there should be 4 processors. Hence "slots=4" should not be a lie.
I can't think of why this would happen. The only difference between
aggressive and degraded mode is that we call sched_yield() in the middle
of tight progression loops in Open MPI, forcing the process to yield to
other processes that are waiting (which will likely be the case in
Can you confirm that your Linux installation thinks that it has 4
processors and will schedule 4 processes simultaneously?
> -----Original Message-----
> From: users-bounces_at_[hidden]
> [mailto:users-bounces_at_[hidden]] On Behalf Of Troy Telford
> Sent: Thursday, June 01, 2006 7:24 PM
> To: users_at_[hidden]
> Subject: [OMPI users] Open MPI and Dual Core (machinefile)
> I'm hoping this is just user error...
> I'm running a single-node job with a node that has two
> dual-core opterons
> (Open MPI 1.0.2).
> compiler=gcc 4.1.0
> arch=x86_64 (64-bit)
> OS=linux 2.6.16
> My machine file looked like this:
> node1 slots=4
> I have an HPL configuration for 4 processors (PxQ=2x2)
> I started with 'mpirun -np 4 -machinefile foo ./xhpl'
> And the problem takes 15 seconds to complete.
> I change the machinefile to read:
> node1 slots=2
> -or, simply-
> It doesn't matter which machinefile I use; I still execute it with:
> 'mpirun -np 4 -machinefile foo ./xhpl'
> Except now the problem takes 0.1 sec to complete.
> It's perfectly repeatable...
> Is there something about the machine file format I'm not
> aware of (with
> respect to dual-core CPUs)? IIRC, slots=(num of processes to
> run per
> node); so two dual-cores should be slots=4. Except 'slots=4'
> makes it run
> a few orders of magnitude slower.
> Troy Telford
> users mailing list