I guess I'll have to ask the basic question: what version are you using?
If you are talking about the trunk, there no longer is a "universe" concept
anywhere in the code. Two mpiruns can connect/accept to each other as long
as they can make contact. To facilitate that, we created an "ompi-server"
tool that is supposed to be run by the sys-admin (or a user, doesn't matter
which) on the head node - there are various ways to tell mpirun how to
contact the server, or it can self-discover it.
I have tested publish/lookup pretty thoroughly and it seems to work. I
haven't spent much time testing connect/accept except via comm_spawn, which
seems to be working. Since that uses the same mechanism, I would have
expected connect/accept to work as well.
If you are talking about 1.2.x, then the story is totally different.
On 4/3/08 2:29 PM, "Aurélien Bouteiller" <bouteill_at_[hidden]> wrote:
> Hi everyone,
> I'm trying to figure out how complete is the implementation of
> Comm_connect/Accept. I found two problematic cases.
> 1) Two different programs are started in two different mpirun. One
> makes accept, the second one use connect. I would not expect
> MPI_Publish_name/Lookup_name to work because they do not share the
> HNP. Still I would expect to be able to connect by copying (with
> printf-scanf) the port_name string generated by Open_port; especially
> considering that in Open MPI, the port_name is a string containing the
> tcp address and port of the rank 0 in the server communicator.
> However, doing so results in "no route to host" and the connecting
> application aborts. Is the problem related to an explicit check of the
> universes on the accept HNP ? Do I expect too much from the MPI
> standard ? Is it because my two applications does not share the same
> universe ? Should we (re) add the ability to use the same universe for
> several mpirun ?
> 2) Second issue is when the program setup a port, and then accept
> multiple clients on this port. Everything works fine for the first
> client, and then accept stalls forever when waiting for the second
> one. My understanding of the standard is that it should work: 5.4.2
> states "it must call MPI_Open_port to establish a port [...] it must
> call MPI_Comm_accept to accept connections from clients". I understand
> that for one MPI_Open_port I should be able to manage several MPI
> clients. Am I understanding correctly the standard here and should we
> fix this ?
> Here is a copy of the non-working code for reference.
> * Copyright (c) 2004-2007 The Trustees of the University of Tennessee.
> * All rights reserved.
> * $COPYRIGHT$
> * Additional copyrights may follow
> * $HEADER$
> #include <stdlib.h>
> #include <stdio.h>
> #include <mpi.h>
> int main(int argc, char *argv)
> char port[MPI_MAX_PORT_NAME];
> int rank;
> int np;
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &np);
> MPI_Comm comm;
> /* client */
> MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
> printf("Read port: %s\n", port);
> MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
> MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
> int nc = np - 1;
> MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
> MPI_Request *reqs = (MPI_Request *) calloc(nc,
> int *event = (int *) calloc(nc, sizeof(int));
> int i;
> MPI_Open_port(MPI_INFO_NULL, port);
> /* MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
> printf("Port name: %s\n", port);
> for(i = 1; i < np; i++)
> MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,
> for(i = 0; i < nc; i++)
> MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> printf("Accept %d\n", i);
> MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
> printf("IRecv %d\n", i);
> MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
> for(i = 0; i < nc; i++)
> printf("event[%d] = %d\n", i, event[i]);
> printf("Disconnect %d\n", i);
> return EXIT_SUCCESS;
> * Dr. Aurélien Bouteiller
> * Sr. Research Associate at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
> devel mailing list