Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name
From: Bernard Secher - SFME/LGLS (bernard.secher_at_[hidden])
Date: 2011-01-07 05:27:01


The accept and connect tests are OK with version openmpi 1.4.1.

I think there is a bug in version 1.5.1

Best
Bernard

Bernard Secher - SFME/LGLS a écrit :
> I get the same dead lock with openmpi tests: pubsub, accept and
> connect with version 1.5.1
>
> Bernard Secher - SFME/LGLS a écrit :
>> Jeff,
>>
>> The dead lock is not in MPI_Comm_accept and MPI_Comm_connect, but
>> before in MPI_Publish_name and MPI_Lookup_name.
>> So the broadcast of srv is not involved in the dead lock.
>>
>> Best
>> Bernard
>>
>> Bernard Secher - SFME/LGLS a écrit :
>>> Jeff,
>>>
>>> Only the processes of the program where process 0 successed to
>>> publish name, have srv=1 and then call MPI_Comm_accept.
>>> The processes of the program where process 0 failed to publish name,
>>> have srv=0 and then call MPI_Comm_connect.
>>>
>>> That's worked like this with openmpi 1.4.1.
>>>
>>> Is it different whith openmpi 1.5.1 ?
>>>
>>> Best
>>> Bernard
>>>
>>>
>>> Jeff Squyres a écrit :
>>>> On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote:
>>>>
>>>>
>>>>> MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service)
>>>>> {
>>>>> int clt=0;
>>>>> MPI_Request request; /* requete pour communication non bloquante */
>>>>> MPI_Comm gcom;
>>>>> MPI_Status status;
>>>>> char port_name_clt[MPI_MAX_PORT_NAME];
>>>>>
>>>>> if( service == NULL ) service = defaultService;
>>>>>
>>>>> /* only process of rank null can publish name */
>>>>> MPI_Barrier(MPI_COMM_WORLD);
>>>>>
>>>>> /* A lookup for an unpublished service generate an error */
>>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>>>> if( myrank == 0 ){
>>>>> /* Try to be a server. If there service is already published, try to be a cient */
>>>>> MPI_Open_port(MPI_INFO_NULL, port_name);
>>>>> printf("[%d] Publish name\n",myrank);
>>>>> if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS ) {
>>>>> *srv = 1;
>>>>> printf("[%d] service %s available at %s\n",myrank,service,port_name);
>>>>> }
>>>>> else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>>>> MPI_Close_port( port_name );
>>>>> clt = 1;
>>>>> }
>>>>> else
>>>>> /* Throw exception */
>>>>> printf("[%d] Error\n",myrank);
>>>>> }
>>>>> else{
>>>>> /* Waiting rank 0 publish name */
>>>>> sleep(1);
>>>>> printf("[%d] Lookup name\n",myrank);
>>>>> if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>>>> clt = 1;
>>>>> }
>>>>> else
>>>>> /* Throw exception */
>>>>> ;
>>>>> }
>>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>>>
>>>>> MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD);
>>>>>
>>>>
>>>> You're broadcasting srv here -- won't everyone now have *srv==1, such that they all call MPI_COMM_ACCEPT, below?
>>>>
>>>>
>>>>> if ( *srv )
>>>>> /* I am the Master */
>>>>> MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom );
>>>>> else{
>>>>> /* Connect to service SERVER, get the inter-communicator server*/
>>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>>>> if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ) == MPI_SUCCESS )
>>>>> printf("[%d] I get the connection with %s at %s !\n",myrank, service, port_name_clt);
>>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>>> }
>>>>>
>>>>> if(myrank != 0) *srv = 0;
>>>>>
>>>>> return gcom;
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users