Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] MPI_Bcast hangs on with multiple nodes
From: Paul Wolfgang (wolfgang_at_[hidden])
Date: 2010-01-29 14:04:42


I have just created a small cluster consisting of three nodes:
    bellhuey AMD 64 with 4 cores
    wolf1 AMD 64 with 2 cores
    wolf2 AMD 64 with 2 cores

The host file is:

bellhuey slots=4
wolf1 slots=2
wolf2 slots=2

bellhuey is the master and wolf1 and wolf2 share the /usr and /home file
systems via NFS
I am running mpi 1.4.1.
I have the following simple program:

#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main (int argc, char* argv[]) {
  int myid, numprocs;
  char me[255];
  int n;
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &myid);
  gethostname(me, 254);
  printf("Hello from %s I am process %d of %d\n", me, myid, numprocs);
  if (myid == 0) {
    n = 12345;
  }
  printf("Call to MPI_Bcast n==%d on %s myid=%d\n", n, me, myid);
  MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
  printf("Return from MPI_Bcast n==%d on %s myid=%d\n", n, me, myid);
  MPI_Finalize();
  return 0;
}

If I run this with
    mpirun -np 8 hello
it works fine, but all processes run on bellhuey

If I run this with
    mpirun -np 8 --hostfile host hello
I get the following:

Hello from bellhuey I am process 0 of 8
Call to MPI_Bcast n==12345 on bellhuey myid=0
Hello from bellhuey I am process 1 of 8
Call to MPI_Bcast n==32767 on bellhuey myid=1
Hello from bellhuey I am process 2 of 8
Call to MPI_Bcast n==32767 on bellhuey myid=2
Hello from wolf1 I am process 5 of 8
Call to MPI_Bcast n==32767 on wolf1 myid=5
Hello from bellhuey I am process 3 of 8
Call to MPI_Bcast n==32767 on bellhuey myid=3
Hello from wolf2 I am process 7 of 8
Call to MPI_Bcast n==32767 on wolf2 myid=7
Hello from wolf2 I am process 6 of 8
Call to MPI_Bcast n==32767 on wolf2 myid=6
Hello from wolf1 I am process 4 of 8
Call to MPI_Bcast n==32767 on wolf1 myid=4

As expected 4 processes are started on bellhuey and two processes each
on wolf1 and wolf2.
However, none of the calls to MPI_Bcast return!

Any help would be appreciated.

Paul Wolfgang