Subject: [OMPI devel] MPI_Comm_connect/Accept
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-04-03 16:29:49

Hi everyone,

I'm trying to figure out how complete is the implementation of
Comm_connect/Accept. I found two problematic cases.

1) Two different programs are started in two different mpirun. One
makes accept, the second one use connect. I would not expect
MPI_Publish_name/Lookup_name to work because they do not share the
HNP. Still I would expect to be able to connect by copying (with
printf-scanf) the port_name string generated by Open_port; especially
considering that in Open MPI, the port_name is a string containing the
tcp address and port of the rank 0 in the server communicator.
However, doing so results in "no route to host" and the connecting
application aborts. Is the problem related to an explicit check of the
universes on the accept HNP ? Do I expect too much from the MPI
standard ? Is it because my two applications does not share the same
universe ? Should we (re) add the ability to use the same universe for
several mpirun ?

2) Second issue is when the program setup a port, and then accept
multiple clients on this port. Everything works fine for the first
client, and then accept stalls forever when waiting for the second
one. My understanding of the standard is that it should work: 5.4.2
states "it must call MPI_Open_port to establish a port [...] it must
call MPI_Comm_accept to accept connections from clients". I understand
that for one MPI_Open_port I should be able to manage several MPI
clients. Am I understanding correctly the standard here and should we
fix this ?

Here is a copy of the non-working code for reference.

  * Copyright (c) 2004-2007 The Trustees of the University of Tennessee.
  * All rights reserved.
  * Additional copyrights may follow
  * $HEADER$
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
     char port[MPI_MAX_PORT_NAME];
     int rank;
     int np;

     MPI_Init(&argc, &argv);
     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
     MPI_Comm_size(MPI_COMM_WORLD, &np);

         MPI_Comm comm;
         /* client */
         MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
         printf("Read port: %s\n", port);
         MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);

         MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
         int nc = np - 1;
         MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
         MPI_Request *reqs = (MPI_Request *) calloc(nc,
         int *event = (int *) calloc(nc, sizeof(int));
         int i;

         MPI_Open_port(MPI_INFO_NULL, port);
/* MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
         printf("Port name: %s\n", port);
         for(i = 1; i < np; i++)
             MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,

         for(i = 0; i < nc; i++)
             MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
             printf("Accept %d\n", i);
             MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
             printf("IRecv %d\n", i);
         MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
         for(i = 0; i < nc; i++)
             printf("event[%d] = %d\n", i, event[i]);
             printf("Disconnect %d\n", i);

     return EXIT_SUCCESS;

