Hakon Bugge presented a paper on this at ISC09. He found that SMT helped several SPEC MPI benchmarks. (He used Platform MPI, not Open MPI.)
He did not oversubscribe, though. He just enabled SMT, which allowed the OS to allocate spare CPU cycles during IO wait, etc.
My conclusions based on his paper and results gathered in our lab:
1) Nehalem SMT is improved over previous versions
2) For best performance, do not oversubscribe physical cores
3) test SMT on a per app basis. Across various HPC workloads, I saw gains up to 14% and penalties up to 26%.
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Robert Kubrick
> Sent: Saturday, July 11, 2009 4:16 PM
> To: Open MPI Users
> Subject: [OMPI users] 2 to 1 oversubscription
> The Open MPI FAQ recommends not to oversubscribe the available cores
> for best performances, but is this still true? The new Nehalem
> processors are built to run 2 threads on each core. On a 8 sockets
> systems, that sums up to 128 threads that Intel claims can be run
> without significant performance degradation. I guess the last word is
> to those who have tried to run some benchmarks and applications on
> the new Intel processors. Any experience to share?
> users mailing list