I would certainly try it -mca btl ^sm and see if that solves the problem.
On May 4, 2010, at 2:38 PM, Eugene Loh wrote:
> Gus Correa wrote:
>> Dear Open MPI experts
>> I need your help to get Open MPI right on a standalone
>> machine with Nehalem processors.
>> How to tweak the mca parameters to avoid problems
>> with Nehalem (and perhaps AMD processors also),
>> where MPI programs hang, was discussed here before.
>> However, I lost track of the details, how to work around the problem,
>> and if it was fully fixed already perhaps.
> Yes, perhaps the problem you're seeing is not what you remember being discussed.
> Perhaps you're thinking of https://svn.open-mpi.org/trac/ompi/ticket/2043 . It's presumably fixed.
>> I am now facing the problem directly on a single Nehalem box.
>> I installed OpenMPI 1.4.1 from source,
>> and compiled the test hello_c.c with mpicc.
>> Then I tried to run it with:
>> 1) mpirun -np 4 a.out
>> It ran OK (but seemed to be slow).
>> 2) mpirun -np 16 a.out
>> It hung, and brought the machine to a halt.
>> Any words of wisdom are appreciated.
>> More info:
>> * OpenMPI 1.4.1 installed from source (tarball from your site).
>> * Compilers are gcc/g++/gfortran 4.4.3-4.
>> * OS is Fedora Core 12.
>> * The machine is a Dell box with Intel Xeon 5540 (quad core)
>> processors on a two-way motherboard and 48GB of RAM.
>> * /proc/cpuinfo indicates that hyperthreading is turned on.
>> (I can see 16 "processors".)
>> What should I do?
>> Use -mca btl ^sm ?
>> Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
>> Use Both?
>> Do something else?
> users mailing list