Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Comm_connect/Accept
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-04-03 17:06:54


I guess I'll have to ask the basic question: what version are you using?

If you are talking about the trunk, there no longer is a "universe" concept
anywhere in the code. Two mpiruns can connect/accept to each other as long
as they can make contact. To facilitate that, we created an "ompi-server"
tool that is supposed to be run by the sys-admin (or a user, doesn't matter
which) on the head node - there are various ways to tell mpirun how to
contact the server, or it can self-discover it.

I have tested publish/lookup pretty thoroughly and it seems to work. I
haven't spent much time testing connect/accept except via comm_spawn, which
seems to be working. Since that uses the same mechanism, I would have
expected connect/accept to work as well.

If you are talking about 1.2.x, then the story is totally different.

Ralph

On 4/3/08 2:29 PM, "Aurélien Bouteiller" <bouteill_at_[hidden]> wrote:

> Hi everyone,
>
> I'm trying to figure out how complete is the implementation of
> Comm_connect/Accept. I found two problematic cases.
>
> 1) Two different programs are started in two different mpirun. One
> makes accept, the second one use connect. I would not expect
> MPI_Publish_name/Lookup_name to work because they do not share the
> HNP. Still I would expect to be able to connect by copying (with
> printf-scanf) the port_name string generated by Open_port; especially
> considering that in Open MPI, the port_name is a string containing the
> tcp address and port of the rank 0 in the server communicator.
> However, doing so results in "no route to host" and the connecting
> application aborts. Is the problem related to an explicit check of the
> universes on the accept HNP ? Do I expect too much from the MPI
> standard ? Is it because my two applications does not share the same
> universe ? Should we (re) add the ability to use the same universe for
> several mpirun ?
>
> 2) Second issue is when the program setup a port, and then accept
> multiple clients on this port. Everything works fine for the first
> client, and then accept stalls forever when waiting for the second
> one. My understanding of the standard is that it should work: 5.4.2
> states "it must call MPI_Open_port to establish a port [...] it must
> call MPI_Comm_accept to accept connections from clients". I understand
> that for one MPI_Open_port I should be able to manage several MPI
> clients. Am I understanding correctly the standard here and should we
> fix this ?
>
> Here is a copy of the non-working code for reference.
>
> /*
> * Copyright (c) 2004-2007 The Trustees of the University of Tennessee.
> * All rights reserved.
> * $COPYRIGHT$
> *
> * Additional copyrights may follow
> *
> * $HEADER$
> */
> #include <stdlib.h>
> #include <stdio.h>
> #include <mpi.h>
>
> int main(int argc, char *argv[])
> {
> char port[MPI_MAX_PORT_NAME];
> int rank;
> int np;
>
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &np);
>
> if(rank)
> {
> MPI_Comm comm;
> /* client */
> MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
> printf("Read port: %s\n", port);
> MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF, &comm);
>
> MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
> MPI_Comm_disconnect(&comm);
> }
> else
> {
> int nc = np - 1;
> MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
> sizeof(MPI_Comm));
> MPI_Request *reqs = (MPI_Request *) calloc(nc,
> sizeof(MPI_Request));
> int *event = (int *) calloc(nc, sizeof(int));
> int i;
>
> MPI_Open_port(MPI_INFO_NULL, port);
> /* MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
> printf("Port name: %s\n", port);
> for(i = 1; i < np; i++)
> MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,
> MPI_COMM_WORLD);
>
> for(i = 0; i < nc; i++)
> {
> MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &comm_nodes[i]);
> printf("Accept %d\n", i);
> MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
> &reqs[i]);
> printf("IRecv %d\n", i);
> }
> MPI_Close_port(port);
> MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
> for(i = 0; i < nc; i++)
> {
> printf("event[%d] = %d\n", i, event[i]);
> MPI_Comm_disconnect(&comm_nodes[i]);
> printf("Disconnect %d\n", i);
> }
> }
>
> MPI_Finalize();
> return EXIT_SUCCESS;
> }
>
>
>
>
> --
> * Dr. Aurélien Bouteiller
> * Sr. Research Associate at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
>
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel