Dear Open MPI experts
I need your help to get Open MPI right on a standalone
machine with Nehalem processors.
How to tweak the mca parameters to avoid problems
with Nehalem (and perhaps AMD processors also),
where MPI programs hang, was discussed here before.
However, I lost track of the details, how to work around the problem,
and if it was fully fixed already perhaps.
I am now facing the problem directly on a single Nehalem box.
I installed OpenMPI 1.4.1 from source,
and compiled the test hello_c.c with mpicc.
Then I tried to run it with:
1) mpirun -np 4 a.out
It ran OK (but seemed to be slow).
2) mpirun -np 16 a.out
It hung, and brought the machine to a halt.
Any words of wisdom are appreciated.
More info:
* OpenMPI 1.4.1 installed from source (tarball from your site).
* Compilers are gcc/g++/gfortran 4.4.3-4.
* OS is Fedora Core 12.
* The machine is a Dell box with Intel Xeon 5540 (quad core)
processors on a two-way motherboard and 48GB of RAM.
* /proc/cpuinfo indicates that hyperthreading is turned on.
(I can see 16 "processors".)
**
What should I do?
Use -mca btl ^sm ?
Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
Use Both?
Do something else?
Many thanks,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
|