Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] two jobs in a single mpirun command and MPI_COMM_WORLD issue ...
From: Iliev, Hristo (iliev_at_[hidden])
Date: 2012-06-21 04:43:34


Hi,

I think you misunderstood what a MIMD launch with mpirun/mpiexec actually
does.

mpirun -np X prog1 : -np Y prog2

starts a *single* MPI job consisting of X+Y processes in total of which the
X processes execute prog1 and Y processes execute prog2 but they still
belong to the same MPI job and hence share the same rank space and
MPI_COMM_WORLD. Ranks 0 to X-1 execute prog1 and ranks X to Y-1 - prog2.

Cheers,

Hristo

From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Ufuk Utku Turuncoglu (BE)
Sent: Thursday, June 21, 2012 9:29 AM
To: users_at_[hidden]
Subject: [OMPI users] two jobs in a single mpirun command and MPI_COMM_WORLD
issue ...

Hi,

I try to submit two MPI jobs using single OpenMPI mpirun command (command
can be seen in job submission script). To test this configuration, i
compiled simple mpihello application and run. The problem is that each
distinct mpihello jobs (run1 and run2) uses same MPI_COMM_WORLD and rank of
the process goes like following,
 
--- out1 (comes from first mpihello.x) ---
 node          17 : Hello world
 node          28 : Hello world
...
...

--- out2 (comes from second mpihello.x) ---
 node         115 : Hello world
 node         113 : Hello world
 node          74 : Hello world
...
...

If the MPI_COMM_WORLD is created separately for each jobs then the node
number (or id or rank) must be start from 0 until 63 in each log file but
this is not the case. So, in the second one the node numbers start from 64
to 131. If Fortran application uses MPI_COMM_SIZE and MPI_COMM_RANK to get
the total number of processor (in this case it is 132), then rank and total
number of processor will be wrong. I think mpirun is not smart enough in
this case. What do you think? Any suggestions can help.

PS: I am using OpenMPI version 1.5.3 compiled with Intel 12.0.4 compilers.

Regards,

--ufuk

--- job submission script (in OpenPBS) ---

#!/bin/bash
#PBS -l walltime=01:00:00
#PBS -l nodes=11:ppn=12
#PBS -N both
#PBS -q esp

# load modules
. /etc/profile.d/modules.sh
module load openmpi/1.5.3/intel/2011
module load netcdf/4.1.1/intel/11.1

# parameters
WRKDIR1=/home/netapp/clima-users/users/uturunco/CAS/run.lake/BOTH
WRKDIR2=/home/netapp/clima-users/users/uturunco/CAS/run.lake/BOTH

# create node files
head -n 64 $PBS_NODEFILE >& $WRKDIR1/nodes1.txt
tail -n 64 $PBS_NODEFILE >& $WRKDIR2/nodes2.txt

# submit jobs
mpirun -np `cat $WRKDIR1/nodes1.txt | wc -l` -machinefile
$WRKDIR1/nodes1.txt -wd $WRKDIR1 ./run1.sh : -np `cat $WRKDIR2/nodes2.txt |
wc -l` -machinefile $WRKDIR2/nodes2.txt -wd $WRKDIR2 ./run2.sh

--- end of job submission script ---

--- script run1.sh ---

#!/bin/sh
./mpihello.x >> out1.txt

--- end of script run1.sh ---

--- script run2.sh ---

#!/bin/sh
./mpihello.x >> out2.txt

--- end of script run2.sh ---

--
Hristo Iliev, Ph.D. -- High Performance Computing
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum der RWTH Aachen
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367


  • application/pkcs7-signature attachment: smime.p7s