You're calling bcast with root=0, so whatever value rank 0 has for srv, everyone will have after the bcast. Plus, I didn't see in your code where *srv was ever set to 0. In my runs, rank 0 is usually the one that publishes first. Everyone then gets the lookup properly, and then the bcast sends srv=1 to everyone. They all then try to call MPI_Comm_accept. Your code was incomplete, so I had to extend it; see attached. Here's a sample output with 8 procs: [7:12] svbu-mpi:~/mpi % mpicc lookup.c -o lookup -g && mpirun lookup [0] Publish name [0] service ocean available at 3853516800.0;tcp://172.29.218.140:36685;tcp://10.10.10.140:36685;tcp://10.10.20.140:36685;tcp://10.10.30.140:36685;tcp://172.16.68.1:36685;tcp://172.16.29.1:36685+3853516801.0;tcp://172.29.218.150:34210;tcp://10.10.30.150:34210:300 Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept [2] Lookup name [6] Lookup name [4] Lookup name [3] Lookup name MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept [1] Lookup name [7] Lookup name MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept [5] Lookup name MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept MPI_Lookup_name succeeded Bcast MPI_Lookup_name succeeded Bcast Bcast complete: srv=1 Server calling MPI_Comm_accept Bcast complete: srv=1 Server calling MPI_Comm_accept [hang -- because everyone's in accept, not connect] On Jan 7, 2011, at 4:17 AM, Bernard Secher - SFME/LGLS wrote:Jeff, Only the processes of the program where process 0 successed to publish name, have srv=1 and then call MPI_Comm_accept. The processes of the program where process 0 failed to publish name, have srv=0 and then call MPI_Comm_connect. That's worked like this with openmpi 1.4.1. Is it different whith openmpi 1.5.1 ? Best Bernard Jeff Squyres a écrit :On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote:MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service) { int clt=0; MPI_Request request; /* requete pour communication non bloquante */ MPI_Comm gcom; MPI_Status status; char port_name_clt[MPI_MAX_PORT_NAME]; if( service == NULL ) service = defaultService; /* only process of rank null can publish name */ MPI_Barrier(MPI_COMM_WORLD); /* A lookup for an unpublished service generate an error */ MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN); if( myrank == 0 ){ /* Try to be a server. If there service is already published, try to be a cient */ MPI_Open_port(MPI_INFO_NULL, port_name); printf("[%d] Publish name\n",myrank); if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS ) { *srv = 1; printf("[%d] service %s available at %s\n",myrank,service,port_name); } else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){ MPI_Close_port( port_name ); clt = 1; } else /* Throw exception */ printf("[%d] Error\n",myrank); } else{ /* Waiting rank 0 publish name */ sleep(1); printf("[%d] Lookup name\n",myrank); if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){ clt = 1; } else /* Throw exception */ ; } MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL); MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD);You're broadcasting srv here -- won't everyone now have *srv==1, such that they all call MPI_COMM_ACCEPT, below?if ( *srv ) /* I am the Master */ MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ); else{ /* Connect to service SERVER, get the inter-communicator server*/ MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN); if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ) == MPI_SUCCESS ) printf("[%d] I get the connection with %s at %s !\n",myrank, service, port_name_clt); MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL); } if(myrank != 0) *srv = 0; return gcom; }
_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
_\\|//_
(' 0 0 ')
____ooO (_) Ooo______________________________________________________
Bernard Sécher DEN/DM2S/SFME/LGLS mailto : bsecher@cea.fr
CEA Saclay, Bât 454, Pièce 114 Phone : 33 (0)1 69 08 73 78
91191 Gif-sur-Yvette Cedex, France Fax : 33 (0)1 69 08 10 87
------------Oooo---------------------------------------------------
oooO ( )
( ) ) /
\ ( (_/
\_)
Ce message électronique et tous les fichiers attachés qu'il contient
sont confidentiels et destinés exclusivement à l'usage de la personne
à laquelle ils sont adressés. Si vous avez reçu ce message par erreur,
merci d'en avertir immédiatement son émetteur et de ne pas en conserver
de copie.
This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual to whom they are addressed.
If you have received this e-mail in error please inform the sender
immediately, without keeping any copy thereof.