Gus Correa wrote:
> Dear Open MPI experts
> I need your help to get Open MPI right on a standalone
> machine with Nehalem processors.
> How to tweak the mca parameters to avoid problems
> with Nehalem (and perhaps AMD processors also),
> where MPI programs hang, was discussed here before.
> However, I lost track of the details, how to work around the problem,
> and if it was fully fixed already perhaps.
Yes, perhaps the problem you're seeing is not what you remember being
Perhaps you're thinking of
https://svn.open-mpi.org/trac/ompi/ticket/2043 . It's presumably fixed.
> I am now facing the problem directly on a single Nehalem box.
> I installed OpenMPI 1.4.1 from source,
> and compiled the test hello_c.c with mpicc.
> Then I tried to run it with:
> 1) mpirun -np 4 a.out
> It ran OK (but seemed to be slow).
> 2) mpirun -np 16 a.out
> It hung, and brought the machine to a halt.
> Any words of wisdom are appreciated.
> More info:
> * OpenMPI 1.4.1 installed from source (tarball from your site).
> * Compilers are gcc/g++/gfortran 4.4.3-4.
> * OS is Fedora Core 12.
> * The machine is a Dell box with Intel Xeon 5540 (quad core)
> processors on a two-way motherboard and 48GB of RAM.
> * /proc/cpuinfo indicates that hyperthreading is turned on.
> (I can see 16 "processors".)
> What should I do?
> Use -mca btl ^sm ?
> Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
> Use Both?
> Do something else?