> I'm benchmarking our (well tested) parallel code on and AMD based system, featuring 2x AMD Opteron(TM) Processor 6276, with 16 cores each for a total of 32 cores. The system is running Scientific Linux 6.1 and OpenMPI 1.4.5.
> When I run a single core job the performance is as expected. However, when I run with 32 processes the performance drops to about 60%
Be aware that on AMD CPUs based on Bulldozer/Interlagos technology 2
cores share the FPU units of one module. There is also a problem with
Cross-Cache-Invalidations  in earlier kernel versions - be sure to
use an up-to-date kernel (2.6.32-220.7.1)