Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] problems with establishing an intercommunicator
From: Waclaw Kusnierczyk (waku_at_[hidden])
Date: 2011-03-08 21:40:53


Hello,

I'm trying to connect two independent MPI process groups with an
intercommunicator, using ports, as described in sec. 10.4 of the MPI
standard. One group runs a server, the other a client. The server
opens a port, publishes the port's name, and waits for a connection.
The client obtains the port's name, and connects to it. The problem is,
the code works if both the server and the client are run in a
one-process MPI group each. If any of the MPI groups has more than one
process, the program hangs.

The following are two fragments of a minimal code example reproducing
the problem on my machine. The server:

     if (rank == 0) {
         MPI_Open_port(MPI_INFO_NULL, port);
         int fifo = open(argv[1], O_WRONLY);
         write(fifo, port, MPI_MAX_PORT_NAME);
         close(fifo);
         printf("[server] listening on port '%s'\n", port);
         MPI_Comm_accept(port, MPI_INFO_NULL, 0, this, &that);
         printf("[server] connected\n");
         MPI_Close_port(port); }
     MPI_Barrier(this);

and the client:

     if (rank == 0) {
         int fifo = open(buffer, O_RDONLY);
         read(fifo, port, MPI_MAX_PORT_NAME);
         close(fifo);
         printf("[client] connecting to port '%s'\n", port);
         MPI_Comm_connect(port, MPI_INFO_NULL, 0, this, &that);
         printf("[client] connected\n"); }
     MPI_Barrier(this);

where 'this' is the local MPI_COMM_WORLD, and the port name is
transmitted via a named pipe. (Complete code together with a makefile
is attached for reference.)

When the compiled codes are run on one MPI process each:

     mkfifo port
     mpirun -np 1 ./server port &
     mpirun -np 1 ./client port

the connection is established as expected. With more than one process
on either side, however, the execution blocks at the connect-accept step
(i.e., after the 'listening' and 'connecting' messages are printed, but
before the 'connected' messages are); using the attached code,

     make NS=2 run

or

     make NC=2 run

should reproduce the problem.

I'm using OpenMPI on two different machines: 1.4 on a 2-core laptop, and
1.3.3 on a large supercomputer, having the same problem on both. Where
do I go wrong?

One more, related question: once I manage to establish an
intercommunicator for two multi-process MPI groups, can any process in
one group send a message to any process in the other, directly, or does
the communication have to go through the root nodes?

Regards,
Wacek