Steffen Brinkmann wrote:
> Hi!
>
> I have installed OpenMPI on a cluster consisting of ~30 nodes with 16 Xeon cores each. NFS is set up and working. For testing I have installed locally with
>
> ./configure --prefix=/home_dir/openmpi-1.4.3_installation/; make all install
>
> everything smooth so far.
>
> When I run a parallel program with
>
> /home_dir/openmpi-1.4.3_installation/bin/mpirun -n 2 ./my_parprog
>
> everything scales perfectly up to -n 16. When I go to -n 32, the execution time is the same as with -n 16.
>
> /home_dir/openmpi-1.4.3_installation/bin/mpirun -n 32 hostname
>
> returns 32 times the same node.
>
> The program is fine (runs since years on several machines) and another mpi installation scales well, so the cluster should be ok as well.
>
> What did I do wrong???
>
> Thanks for any hint!
>
> Steffen
>
>
> --
> Dr. Steffen Brinkmann
> High Performance Computing Center Stuttgart (HLRS)
> NobelstraÃe 19
> D - 70569 Stuttgart
> Germany
>
> Phone: ++49(0)711 / 685-64548
> Fax: ++49(0)711 / 685-65832
>
Hi Steffen
See this FAQ:
http://www.open-mpi.org/faq/?category=running#mpirun-host
If you have a resource manager, such as Torque or SGE,
you can build OpenMPI with support for it.
This will obviate the need to specify the nodes,
as the resource manager will take care of that for you:
http://www.open-mpi.org/faq/?category=building#build-rte-tm
http://www.open-mpi.org/faq/?category=building#build-rte-sge
BTW, the OpenMPI FAQ are the 'de facto' (and good)
OpenMPI documentation:
http://www.open-mpi.org/faq/
Other sources are the README file and the mpiexec man page.
I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
|