Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-12-10 17:57:27


On Dec 10, 2009, at 5:53 PM, Gus Correa wrote:

> How does the efficiency of loopback
> (let's say, over TCP and over IB) compare with "sm"?

Definitely not as good; that's why we have sm. :-) I don't have any quantification of that assertion, though (i.e., no numbers to back that up).

> FYI, I do NOT see the problem reported by Matthew et al.
> on our AMD Opteron Shanghai dual-socket quad-core.
> They run a quite outdated
> CentOS kernel 2.6.18-92.1.22.el5, with gcc 4.1.2.
> and OpenMPI 1.3.2.
> (I've been lazy to upgrade, it is a production machine.)
>
> I could run all three OpenMPI test programs (hello_c, ring_c, and
> connectivity_c) on all 8 cores on a single node WITH "sm" turned ON
> with no problem whatsoever.

Good.

> Moreover, all works fine if I oversuscribe up to 256 processes on
> one node.
> Beyond that I get segmentation fault (not hanging) sometimes,
> but not always.
> I understand that extreme oversubscription is a no-no.

It's quite possible that extreme oversubscription and/or that many procs in sm have not been well-tested.

> Moreover, on the screenshots that Matthew posted, the cores
> were at 100% CPU utilization on the simple connectivity_c
> (although this was when he had "sm" turned on on Nehalem).
> On my platform I don't get anything more than 3% or so.

100% CPU utilization usually means that some completion hasn't occurred that was expected and therefore everything is spinning waiting for that completion. The "hasn't occurred" bit is probably the bug here -- it's likely that there should have been a completion that somehow got missed. But this is speculative -- we're still investigating...

-- 
Jeff Squyres
jsquyres_at_[hidden]