Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] automatically creating a machinefile
From: Reuti (reuti_at_[hidden])
Date: 2012-07-05 05:28:09


Am 05.07.2012 um 00:38 schrieb Dominik Goeddeke:

> no idea of Rocks, but with PBS and SLURM, I always do this directly in the job submission script. Below is an example of an admittedly spaghetti-code script that does this -- assuming proper (un)commenting -- for PBS and SLURM

For SLURM, Torque and GridEngine I would suggest to use the builtin support in Open MPI and MPICH2 directly. So there is no need to build a machinefile from the given list of selected nodes. In addition it will also provide a tight integration of the parallel job into the queuing system and jobs can nicely be removed.

Is there any reason to bypass this mechanism?

-- Reuti

> and OpenMPI and MPICH2, for one particular machine that I have been toying around with lately ...
>
> Dominik
>
> #!/bin/bash
>
> #################### PBS
> #PBS -N feast
> #PBS -l nodes=25:ppn=2
> #PBS -q batch
> #PBS -l walltime=2:00:00
> #job should not rerun if it fails
> #PBS -r n
>
> ####### SLURM
> # @ job_name = feaststrong1
> # @ initialdir = .
> # @ output = feaststrong1_%j.out
> # @ error = feaststrong1_%j.err
> # @ total_tasks = 50
> # @ cpus_per_task = 1
> # @ wall_clock_limit = 2:00:00
>
> # modules
> module purge
> module load gcc/4.6.2
> module load openmpi/1.5.4
> #module load mpich2/1.4.1
>
> # cd into wdir
> cd $HOME/feast/feast/feast/applications/poisson_coproc
>
>
> ##### PBS with MPICH2
> # create machine files to isolate the master process
> #cat $PBS_NODEFILE > nodes.txt
> ## extract slaves
> #sort -u nodes.txt > temp.txt
> #lines=`wc -l temp.txt | awk '{print $1}'`
> #((lines=$lines - 1))
> #tail -n $lines temp.txt > slavetemp.txt
> #cat slavetemp.txt | awk '{print $0 ":2"}' > slaves.txt
> ## extract master
> #head -n 1 temp.txt > mastertemp.txt
> #cat mastertemp.txt | awk '{print $0 ":1"}' > master.txt
> ## merge into one dual nodefile
> #cat master.txt > dual.hostfile
> #cat slaves.txt >> dual.hostfile
> ## same for single hostfile
> #tail -n $lines temp.txt > slavetemp.txt
> #cat slavetemp.txt | awk '{print $0 ":1"}' > slaves.txt
> ## extract master
> #head -n 1 temp.txt > mastertemp.txt
> #cat mastertemp.txt | awk '{print $0 ":1"}' > master.txt
> ## merge into one single nodefile
> #cat master.txt > single.hostfile
> #cat slaves.txt >> single.hostfile
> ## and clean up
> #rm -f slavetemp.txt mastertemp.txt master.txt slaves.txt temp.txt nodes.txt
>
> # 4 nodes
> #mpiexec -n 7 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np007.dat
> #mkdir arm-strongscaling-series1-L8-nodes04
> #mv feastlog.* arm-strongscaling-series1-L8-nodes04
>
> # 7 nodes
> #mpiexec -n 13 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np013.dat
> #mkdir arm-strongscaling-series1-L8-nodes07
> #mv feastlog.* arm-strongscaling-series1-L8-nodes07
>
> # 13 nodes
> #mpiexec -n 25 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np025.dat
> #mkdir arm-strongscaling-series1-L8-nodes13
> #mv feastlog.* arm-strongscaling-series1-L8-nodes13
>
> # 25 nodes
> #mpiexec -n 49 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np049.dat
> #mkdir arm-strongscaling-series1-L8-nodes25
> #mv feastlog.* arm-strongscaling-series1-L8-nodes25
>
>
> ############## SLURM
>
> # figure out which nodes we got
> srun /bin/hostname | sort > availhosts3.txt
>
> lines=`wc -l availhosts3.txt | awk '{print $1}'`
> ((lines=$lines - 2))
> tail -n $lines availhosts3.txt > slaves3.txt
> head -n 1 availhosts3.txt > master3.txt
> cat master3.txt > hostfile3.txt
> cat slaves3.txt >> hostfile3.txt
> # DGDG: SLURM -m arbitrary not supported by OpenMPI
> #export SLURM_HOSTFILE=./hostfile3.txt
>
>
> # 4 nodes
> #mpirun -np 7 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np007.dat
> mpirun -np 7 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np007.dat
> #mpiexec -n 7 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np007.dat
> #srun -n 7 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np007.dat
> mkdir arm-strongscaling-series1-L8-nodes04
> mv feastlog.* arm-strongscaling-series1-L8-nodes04
>
> # 7 nodes
> #mpirun -np 13 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np013.dat
> mpirun -np 13 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np013.dat
> #mpiexec -n 13 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np013.dat
> #srun -n 13 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np013.dat
> mkdir arm-strongscaling-series1-L8-nodes07
> mv feastlog.* arm-strongscaling-series1-L8-nodes07
>
> # 13 nodes
> #mpirun -np 25 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np025.dat
> mpirun -np 25 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np025.dat
> #mpiexec -n 25 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np025.dat
> #srun -n 25 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np025.dat
> mkdir arm-strongscaling-series1-L8-nodes13
> mv feastlog.* arm-strongscaling-series1-L8-nodes13
>
> # 25 nodes
> #mpirun -np 49 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np049.dat
> mpirun -np 49 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np049.dat
> #mpiexec -n 49 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np049.dat
> #srun -n 49 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np049.dat
> mkdir arm-strongscaling-series1-L8-nodes25
> mv feastlog.* arm-strongscaling-series1-L8-nodes25
>
>
>
>
>
>
> On 07/05/2012 12:10 AM, Hodgess, Erin wrote:
>> Dear MPI people:
>>
>> Is there a way (a script) available to automatically generate a machinefile, please?
>>
>> This would be on Rocks.
>>
>> ompi_info -v ompi full --parsable
>> package:Open MPI root_at_[hidden] Distribution
>> ompi:version:full:1.3.2
>> ompi:version:svn:r21054
>> ompi:version:release_date:Apr 21, 2009
>> orte:version:full:1.3.2
>> orte:version:svn:r21054
>> orte:version:release_date:Apr 21, 2009
>> opal:version:full:1.3.2
>> opal:version:svn:r21054
>> opal:version:release_date:Apr 21, 2009
>> ident:1.3.2
>>
>> Thanks,
>> Erin
>>
>>
>>
>> Erin M. Hodgess, PhD
>> Associate Professor
>> Department of Computer and Mathematical Sciences
>> University of Houston - Downtown
>> mailto: hodgesse_at_[hidden]
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>>
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jun.-Prof. Dr. Dominik Göddeke
> Hardware-orientierte Numerik für große Systeme
> Institut für Angewandte Mathematik (LS III)
> Fakultät für Mathematik, Technische Universität Dortmund
>
> http://www.mathematik.tu-dortmund.de/~goeddeke
>
> Tel. +49-(0)231-755-7218 Fax +49-(0)231-755-5933
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users