Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Deadlock when creating too many communicators
From: Wolfgang Bangerth (bangerth_at_[hidden])
Date: 2009-09-05 16:52:14

here's a creative way to deadlock a program: create and destroy 65500 and
some communicators and send a message on each of them:
#include <mpi.h>

#define CHECK(a) \
  { \
    int err = (a); \
    if (err != 0) std::cout << "Error in line " << __LINE__ << std::endl; \

int main (int argc, char *argv[])
  int a=0, b;

  MPI_Init (&argc, &argv);
  for (int i=0; i<1000000; ++i)
      if (i % 100 == 0) std::cout<< "Duplication event " << i << std::endl;

      MPI_Comm dup;
      CHECK(MPI_Comm_dup (MPI_COMM_WORLD, &dup));
      CHECK(MPI_Allreduce(&a, &b, 1, MPI_INT, MPI_MIN, dup));
      CHECK(MPI_Comm_free (&dup));

If you run this, for example, on two processors with OpenMPI 1.2.6 or
1.3.2, you'll see that the program runs until after it produces 65500 as
output, and then just hangs -- on my system somewhere in the operating
system poll(), running full steam.

Since I take care of destroying the communicators again, I would have
expected this to work. I use creating many communicators basically as a
debugging tool: every object gets its own communicator to work on to
ensure that different objects don't communicate by accident with each
other just because they all use MPI_COMM_WORLD. It would be nice if this
mode of using MPI could be made to work.

Best & thanks in advance!

Wolfgang Bangerth                email:            bangerth_at_[hidden]