Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] two jobs in a single mpirun command and MPI_COMM_WORLD issue ...
From: Ufuk Utku Turuncoglu (BE) (u.utku.turuncoglu_at_[hidden])
Date: 2012-06-21 03:28:42


I try to submit two MPI jobs using single OpenMPI mpirun command
(command can be seen in job submission script). To test this
configuration, i compiled simple mpihello application and run. The
problem is that each distinct mpihello jobs (run1 and run2) uses same
MPI_COMM_WORLD and rank of the process goes like following,

--- out1 (comes from first mpihello.x) ---
  node 17 : Hello world
  node 28 : Hello world

--- out2 (comes from second mpihello.x) ---
  node 115 : Hello world
  node 113 : Hello world
  node 74 : Hello world

If the MPI_COMM_WORLD is created separately for each jobs then the node
number (or id or rank) must be start from 0 until 63 in each log file
but this is not the case. So, in the second one the node numbers start
from 64 to 131. If Fortran application uses MPI_COMM_SIZE and
MPI_COMM_RANK to get the total number of processor (in this case it is
132), then rank and total number of processor will be wrong. I think
mpirun is not smart enough in this case. What do you think? Any
suggestions can help.

PS: I am using OpenMPI version 1.5.3 compiled with Intel 12.0.4 compilers.



*--- job submission script (in OpenPBS) ---*

#PBS -l walltime=01:00:00
#PBS -l nodes=11:ppn=12
#PBS -N both
#PBS -q esp

# load modules
. /etc/profile.d/
module load openmpi/1.5.3/intel/2011
module load netcdf/4.1.1/intel/11.1

# parameters

# create node files
head -n 64 $PBS_NODEFILE >& $WRKDIR1/nodes1.txt
tail -n 64 $PBS_NODEFILE >& $WRKDIR2/nodes2.txt

# submit jobs
mpirun -np `cat $WRKDIR1/nodes1.txt | wc -l` -machinefile
$WRKDIR1/nodes1.txt -wd $WRKDIR1 ./ : -np `cat
$WRKDIR2/nodes2.txt | wc -l` -machinefile $WRKDIR2/nodes2.txt -wd

*--- end of job submission script ---

---**script ---*

./mpihello.x >> out1.txt

*--- end of **script ---

**--- script ---*

./mpihello.x >> out2.txt

*--- end of **script ---*