Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Comm_connect/Accept
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-04-03 17:10:05


Ralph,

I am using trunk. Is there a documentation for ompi-server ? Sounds
exactly like what I need to fix point 1.

Aurelien

Le 3 avr. 08 à 17:06, Ralph Castain a écrit :
> I guess I'll have to ask the basic question: what version are you
> using?
>
> If you are talking about the trunk, there no longer is a "universe"
> concept
> anywhere in the code. Two mpiruns can connect/accept to each other
> as long
> as they can make contact. To facilitate that, we created an "ompi-
> server"
> tool that is supposed to be run by the sys-admin (or a user, doesn't
> matter
> which) on the head node - there are various ways to tell mpirun how to
> contact the server, or it can self-discover it.
>
> I have tested publish/lookup pretty thoroughly and it seems to work. I
> haven't spent much time testing connect/accept except via
> comm_spawn, which
> seems to be working. Since that uses the same mechanism, I would have
> expected connect/accept to work as well.
>
> If you are talking about 1.2.x, then the story is totally different.
>
> Ralph
>
>
>
> On 4/3/08 2:29 PM, "Aurélien Bouteiller" <bouteill_at_[hidden]>
> wrote:
>
>> Hi everyone,
>>
>> I'm trying to figure out how complete is the implementation of
>> Comm_connect/Accept. I found two problematic cases.
>>
>> 1) Two different programs are started in two different mpirun. One
>> makes accept, the second one use connect. I would not expect
>> MPI_Publish_name/Lookup_name to work because they do not share the
>> HNP. Still I would expect to be able to connect by copying (with
>> printf-scanf) the port_name string generated by Open_port; especially
>> considering that in Open MPI, the port_name is a string containing
>> the
>> tcp address and port of the rank 0 in the server communicator.
>> However, doing so results in "no route to host" and the connecting
>> application aborts. Is the problem related to an explicit check of
>> the
>> universes on the accept HNP ? Do I expect too much from the MPI
>> standard ? Is it because my two applications does not share the same
>> universe ? Should we (re) add the ability to use the same universe
>> for
>> several mpirun ?
>>
>> 2) Second issue is when the program setup a port, and then accept
>> multiple clients on this port. Everything works fine for the first
>> client, and then accept stalls forever when waiting for the second
>> one. My understanding of the standard is that it should work: 5.4.2
>> states "it must call MPI_Open_port to establish a port [...] it must
>> call MPI_Comm_accept to accept connections from clients". I
>> understand
>> that for one MPI_Open_port I should be able to manage several MPI
>> clients. Am I understanding correctly the standard here and should we
>> fix this ?
>>
>> Here is a copy of the non-working code for reference.
>>
>> /*
>> * Copyright (c) 2004-2007 The Trustees of the University of
>> Tennessee.
>> * All rights reserved.
>> * $COPYRIGHT$
>> *
>> * Additional copyrights may follow
>> *
>> * $HEADER$
>> */
>> #include <stdlib.h>
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main(int argc, char *argv[])
>> {
>> char port[MPI_MAX_PORT_NAME];
>> int rank;
>> int np;
>>
>>
>> MPI_Init(&argc, &argv);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &np);
>>
>> if(rank)
>> {
>> MPI_Comm comm;
>> /* client */
>> MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
>> MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>> printf("Read port: %s\n", port);
>> MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
>> &comm);
>>
>> MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
>> MPI_Comm_disconnect(&comm);
>> }
>> else
>> {
>> int nc = np - 1;
>> MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
>> sizeof(MPI_Comm));
>> MPI_Request *reqs = (MPI_Request *) calloc(nc,
>> sizeof(MPI_Request));
>> int *event = (int *) calloc(nc, sizeof(int));
>> int i;
>>
>> MPI_Open_port(MPI_INFO_NULL, port);
>> /* MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
>> printf("Port name: %s\n", port);
>> for(i = 1; i < np; i++)
>> MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,
>> MPI_COMM_WORLD);
>>
>> for(i = 0; i < nc; i++)
>> {
>> MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
>> &comm_nodes[i]);
>> printf("Accept %d\n", i);
>> MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
>> &reqs[i]);
>> printf("IRecv %d\n", i);
>> }
>> MPI_Close_port(port);
>> MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
>> for(i = 0; i < nc; i++)
>> {
>> printf("event[%d] = %d\n", i, event[i]);
>> MPI_Comm_disconnect(&comm_nodes[i]);
>> printf("Disconnect %d\n", i);
>> }
>> }
>>
>> MPI_Finalize();
>> return EXIT_SUCCESS;
>> }
>>
>>
>>
>>
>> --
>> * Dr. Aurélien Bouteiller
>> * Sr. Research Associate at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 350
>> * Knoxville, TN 37996
>> * 865 974 6321
>>
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel