Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-01-07 10:15:50


You're calling bcast with root=0, so whatever value rank 0 has for srv, everyone will have after the bcast. Plus, I didn't see in your code where *srv was ever set to 0.

In my runs, rank 0 is usually the one that publishes first. Everyone then gets the lookup properly, and then the bcast sends srv=1 to everyone. They all then try to call MPI_Comm_accept.

Your code was incomplete, so I had to extend it; see attached.

Here's a sample output with 8 procs:

[7:12] svbu-mpi:~/mpi % mpicc lookup.c -o lookup -g && mpirun lookup
[0] Publish name
[0] service ocean available at 3853516800.0;tcp://172.29.218.140:36685;tcp://10.10.10.140:36685;tcp://10.10.20.140:36685;tcp://10.10.30.140:36685;tcp://172.16.68.1:36685;tcp://172.16.29.1:36685+3853516801.0;tcp://172.29.218.150:34210;tcp://10.10.30.150:34210:300
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
[2] Lookup name
[6] Lookup name
[4] Lookup name
[3] Lookup name
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
[1] Lookup name
[7] Lookup name
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
[5] Lookup name
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
MPI_Lookup_name succeeded
Bcast
MPI_Lookup_name succeeded
Bcast
Bcast complete: srv=1
Server calling MPI_Comm_accept
Bcast complete: srv=1
Server calling MPI_Comm_accept
[hang -- because everyone's in accept, not connect]

On Jan 7, 2011, at 4:17 AM, Bernard Secher - SFME/LGLS wrote:

> Jeff,
>
> Only the processes of the program where process 0 successed to publish name, have srv=1 and then call MPI_Comm_accept.
> The processes of the program where process 0 failed to publish name, have srv=0 and then call MPI_Comm_connect.
>
> That's worked like this with openmpi 1.4.1.
>
> Is it different whith openmpi 1.5.1 ?
>
> Best
> Bernard
>
>
> Jeff Squyres a écrit :
>> On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote:
>>
>>
>>
>>> MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service)
>>> {
>>> int clt=0;
>>> MPI_Request request; /* requete pour communication non bloquante */
>>> MPI_Comm gcom;
>>> MPI_Status status;
>>> char port_name_clt[MPI_MAX_PORT_NAME];
>>>
>>> if( service == NULL ) service = defaultService;
>>>
>>> /* only process of rank null can publish name */
>>> MPI_Barrier(MPI_COMM_WORLD);
>>>
>>> /* A lookup for an unpublished service generate an error */
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>> if( myrank == 0 ){
>>> /* Try to be a server. If there service is already published, try to be a cient */
>>> MPI_Open_port(MPI_INFO_NULL, port_name);
>>> printf("[%d] Publish name\n",myrank);
>>> if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS ) {
>>> *srv = 1;
>>> printf("[%d] service %s available at %s\n",myrank,service,port_name);
>>> }
>>> else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>> MPI_Close_port( port_name );
>>> clt = 1;
>>> }
>>> else
>>> /* Throw exception */
>>> printf("[%d] Error\n",myrank);
>>> }
>>> else{
>>> /* Waiting rank 0 publish name */
>>> sleep(1);
>>> printf("[%d] Lookup name\n",myrank);
>>> if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>> clt = 1;
>>> }
>>> else
>>> /* Throw exception */
>>> ;
>>> }
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>
>>> MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD);
>>>
>>>
>>
>> You're broadcasting srv here -- won't everyone now have *srv==1, such that they all call MPI_COMM_ACCEPT, below?
>>
>>
>>
>>> if ( *srv )
>>> /* I am the Master */
>>> MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom );
>>> else{
>>> /* Connect to service SERVER, get the inter-communicator server*/
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>> if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ) == MPI_SUCCESS )
>>> printf("[%d] I get the connection with %s at %s !\n",myrank, service, port_name_clt);
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>> }
>>>
>>> if(myrank != 0) *srv = 0;
>>>
>>> return gcom;
>>>
>>> }

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


  • application/octet-stream attachment: lookup.c