Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name
From: Bernard Secher - SFME/LGLS (bernard.secher_at_[hidden])
Date: 2011-01-07 04:46:37


Jeff,

The dead lock is not in MPI_Comm_accept and MPI_Comm_connect, but before
in MPI_Publish_name and MPI_Lookup_name.
So the broadcast of srv is not involved in the dead lock.

Best
Bernard

Bernard Secher - SFME/LGLS a écrit :
> Jeff,
>
> Only the processes of the program where process 0 successed to publish
> name, have srv=1 and then call MPI_Comm_accept.
> The processes of the program where process 0 failed to publish name,
> have srv=0 and then call MPI_Comm_connect.
>
> That's worked like this with openmpi 1.4.1.
>
> Is it different whith openmpi 1.5.1 ?
>
> Best
> Bernard
>
>
> Jeff Squyres a écrit :
>> On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote:
>>
>>
>>> MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service)
>>> {
>>> int clt=0;
>>> MPI_Request request; /* requete pour communication non bloquante */
>>> MPI_Comm gcom;
>>> MPI_Status status;
>>> char port_name_clt[MPI_MAX_PORT_NAME];
>>>
>>> if( service == NULL ) service = defaultService;
>>>
>>> /* only process of rank null can publish name */
>>> MPI_Barrier(MPI_COMM_WORLD);
>>>
>>> /* A lookup for an unpublished service generate an error */
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>> if( myrank == 0 ){
>>> /* Try to be a server. If there service is already published, try to be a cient */
>>> MPI_Open_port(MPI_INFO_NULL, port_name);
>>> printf("[%d] Publish name\n",myrank);
>>> if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS ) {
>>> *srv = 1;
>>> printf("[%d] service %s available at %s\n",myrank,service,port_name);
>>> }
>>> else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>> MPI_Close_port( port_name );
>>> clt = 1;
>>> }
>>> else
>>> /* Throw exception */
>>> printf("[%d] Error\n",myrank);
>>> }
>>> else{
>>> /* Waiting rank 0 publish name */
>>> sleep(1);
>>> printf("[%d] Lookup name\n",myrank);
>>> if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>> clt = 1;
>>> }
>>> else
>>> /* Throw exception */
>>> ;
>>> }
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>
>>> MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD);
>>>
>>
>> You're broadcasting srv here -- won't everyone now have *srv==1, such that they all call MPI_COMM_ACCEPT, below?
>>
>>
>>> if ( *srv )
>>> /* I am the Master */
>>> MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom );
>>> else{
>>> /* Connect to service SERVER, get the inter-communicator server*/
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>> if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ) == MPI_SUCCESS )
>>> printf("[%d] I get the connection with %s at %s !\n",myrank, service, port_name_clt);
>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>> }
>>>
>>> if(myrank != 0) *srv = 0;
>>>
>>> return gcom;
>>>
>>> }
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>