Hello,


I have run into an issue that appears to be related to sending messages to multiple processes on a single remote host prior to the remote processes sending messages to the origin. I have cooked the issue down to the following:


Test Environment of 3 Identical Hosts:

·         * Intel i7-2600K, 12GB ram, Intel GB Ethernet, DLink Switch

 ·       * Windows 2008R2 x64 with all current updates

·         * OMPI  (all three hosts report the same ompi_info and were installed with the same binary)
http://www.open-mpi.org/software/ompi/v1.5/downloads/OpenMPI_v1.5.4-1_win64.exe

C:\GDX>ompi_info -v ompi full --parsable

package:Open MPI hpcfan@VISCLUSTER26 Distribution

ompi:version:full:1.5.4

ompi:version:svn:r25060

ompi:version:release_date:Aug 18, 2011

orte:version:full:1.5.4

orte:version:svn:r25060

orte:version:release_date:Aug 18, 2011

opal:version:full:1.5.4

opal:version:svn:r25060

opal:version:release_date:Aug 18, 2011

ident:1.5.4

 

Test Program:

#include <stdio.h>

#define OMPI_IMPORTS

#include "C:\Program Files (x86)\OpenMPI_v1.5.4-x64\include\mpi.h"

 

int main(int argc, char *argv[])

{

   int rank, size, i, msg;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   MPI_Comm_size(MPI_COMM_WORLD, &size);

   printf("Process %i of %i initialized\n", rank, size);

 

   if (0 == rank) {

      for (i = 1; i < size; i++) {

         printf("Process %i sending %i to %i\n", rank, i, i);

         MPI_Send(&rank, 1, MPI_INT, i, 0, MPI_COMM_WORLD);

      }

      for (i = 1; i < size; i++) {

         MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

         printf("Process %i received %i\n", rank, msg);

      }

   }

   else {

      MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

      printf("Process %i received %i\n", rank, msg);

      MPI_Send(&rank, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);

      printf("Process %i sent %i to %i\n", rank, rank, 0);

   }

 

   printf("Process %i exiting\n", rank);

   MPI_Finalize();

   return 0;

}

 

Test Cases:

·            X procs on the originating node: Working

·            X procs on the originating node and one proc on one or more remote nodes: Working

·            X procs on the originating node and more than one proc on any remote node:  Fails

A test with two procs on the origin and one on each of two remote nodes runs, however the same test with the two remote procs on the same machine fails on the second remote send. Here are some test runs (the ^C indicates a hang).

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 2 c:\gdx\distmsg.exe

 

 ========================   JOB MAP   ========================

 

 Data for node: Yap     Num procs: 2

        Process OMPI jobid: [42094,1] Process rank: 0

        Process OMPI jobid: [42094,1] Process rank: 1

 

 =============================================================

Process 0 of 2 initialized

Process 1 of 2 initialized

Process 0 sending 1 to 1

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 received 1

Process 0 exiting

 

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 3 c:\gdx\distmsg.exe

 

 ========================   JOB MAP   ========================

 

 Data for node: Yap     Num procs: 2

        Process OMPI jobid: [42014,1] Process rank: 0

        Process OMPI jobid: [42014,1] Process rank: 1

 

 Data for node: chuuk   Num procs: 1

        Process OMPI jobid: [42014,1] Process rank: 2

 

 =============================================================

connecting to chuuk

username:administrator

password:********

Save Credential?(Y/N) n

Process 0 of 3 initialized

Process 1 of 3 initialized

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 received 1

Process 0 received 2

Process 0 exiting

 

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 4 c:\gdx\distmsg.exe

 

 ========================   JOB MAP   ========================

 

 Data for node: Yap     Num procs: 2

        Process OMPI jobid: [43894,1] Process rank: 0

        Process OMPI jobid: [43894,1] Process rank: 1

 

 Data for node: chuuk   Num procs: 2

        Process OMPI jobid: [43894,1] Process rank: 2

        Process OMPI jobid: [43894,1] Process rank: 3

 

 =============================================================

connecting to chuuk

username:administrator

password:********

Save Credential?(Y/N) n

Process 0 of 4 initialized

Process 1 of 4 initialized

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 sending 3 to 3

^C

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 4 c:\gdx\distmsg.exe

 

 ========================   JOB MAP   ========================

 

 Data for node: Yap     Num procs: 2

        Process OMPI jobid: [43310,1] Process rank: 0

        Process OMPI jobid: [43310,1] Process rank: 1

 

 Data for node: chuuk   Num procs: 1

        Process OMPI jobid: [43310,1] Process rank: 2

 

 Data for node: kosrae  Num procs: 1

        Process OMPI jobid: [43310,1] Process rank: 3

 

 =============================================================

connecting to chuuk

username:administrator

password:********

Save Credential?(Y/N) n

connecting to kosrae

username:administrator

password:********

Save Credential?(Y/N) n

Process 0 of 4 initialized

Process 1 of 4 initialized

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 sending 3 to 3

Process 0 received 1

Process 0 received 2

Process 0 received 3

Process 0 exiting

 

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 5 c:\gdx\distmsg.exe

 

 ========================   JOB MAP   ========================

 

 Data for node: Yap     Num procs: 2

        Process OMPI jobid: [43590,1] Process rank: 0

        Process OMPI jobid: [43590,1] Process rank: 1

 

 Data for node: chuuk   Num procs: 2

        Process OMPI jobid: [43590,1] Process rank: 2

        Process OMPI jobid: [43590,1] Process rank: 3

 

 Data for node: kosrae  Num procs: 1

        Process OMPI jobid: [43590,1] Process rank: 4

 

 =============================================================

connecting to chuuk

username:administrator

password:********

Save Credential?(Y/N) n

connecting to kosrae

username:administrator

password:********

Save Credential?(Y/N) n

Process 0 of 5 initialized

Process 1 of 5 initialized

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 sending 3 to 3

^C

 

The remote process which is the target of the hung send seems to generate significant ongoing CPU activity and "Other" I/O.


Workaround

Curiously swapping the send/receive order solves the problem.

#include <stdio.h>

#define OMPI_IMPORTS

#include "C:\Program Files (x86)\OpenMPI_v1.5.4-x64\include\mpi.h"

 

int main(int argc, char *argv[])

{

   int rank, size, i, msg;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   MPI_Comm_size(MPI_COMM_WORLD, &size);

   printf("Process %i of %i initialized\n", rank, size);

 

   if (0 == rank) {

      for (i = 1; i < size; i++) {

         MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

         printf("Process %i received %i\n", rank, msg);

      }

      for (i = 1; i < size; i++) {

         printf("Process %i sending %i to %i\n", rank, i, i);

         MPI_Send(&rank, 1, MPI_INT, i, 0, MPI_COMM_WORLD);

      }

   }

   else {

      MPI_Send(&rank, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);

      printf("Process %i sent %i to %i\n", rank, rank, 0);

      MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

      printf("Process %i received %i\n", rank, msg);

   }

 

   printf("Process %i exiting\n", rank);

   MPI_Finalize();

   return 0;

}

 

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 5 c:\gdx\distmsgb.exe

 

 ========================   JOB MAP   ========================

 

 Data for node: Yap     Num procs: 2

        Process OMPI jobid: [43126,1] Process rank: 0

        Process OMPI jobid: [43126,1] Process rank: 1

 

 Data for node: chuuk   Num procs: 2

        Process OMPI jobid: [43126,1] Process rank: 2

        Process OMPI jobid: [43126,1] Process rank: 3

 

 Data for node: kosrae  Num procs: 1

        Process OMPI jobid: [43126,1] Process rank: 4

 

 =============================================================

connecting to chuuk

username:administrator

password:********

Save Credential?(Y/N) n

connecting to kosrae

username:administrator

password:********

Save Credential?(Y/N) n

Process 0 of 5 initialized

Process 1 of 5 initialized

Process 1 sent 1 to 0

Process 0 received 4

Process 0 received 1

Process 0 received 2

Process 0 received 3

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 0 sending 3 to 3

Process 0 sending 4 to 4

Process 0 exiting

Process 1 received 0

Process 1 exiting