Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name
From: Bernard Secher - SFME/LGLS (bernard.secher_at_[hidden])
Date: 2011-01-07 10:41:46


srv = 0 is set in my main program
I call Bcast because all the processes must call MPI_Comm_accept
(collective) or must call MPI_Comm_connect (collective)

Anyway, I get also a dead lock with your lookup program:

That's what I do:

ompi-server -r URIfile

mpirun -np 1 -ompi-server file:URIfile lookup& (it the program which
publish the name)
mpirun -np 1 -ompi-server file:URIfile lookup (it is the program which
lookup the name)

 From these two programs I create a global communicator to exchange
communications between the two others

Best
Bernard

Jeff Squyres a écrit :
> You're calling bcast with root=0, so whatever value rank 0 has for srv, everyone will have after the bcast. Plus, I didn't see in your code where *srv was ever set to 0.
>
> In my runs, rank 0 is usually the one that publishes first. Everyone then gets the lookup properly, and then the bcast sends srv=1 to everyone. They all then try to call MPI_Comm_accept.
>
> Your code was incomplete, so I had to extend it; see attached.
>
> Here's a sample output with 8 procs:
>
> [7:12] svbu-mpi:~/mpi % mpicc lookup.c -o lookup -g && mpirun lookup
> [0] Publish name
> [0] service ocean available at 3853516800.0;tcp://172.29.218.140:36685;tcp://10.10.10.140:36685;tcp://10.10.20.140:36685;tcp://10.10.30.140:36685;tcp://172.16.68.1:36685;tcp://172.16.29.1:36685+3853516801.0;tcp://172.29.218.150:34210;tcp://10.10.30.150:34210:300
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> [2] Lookup name
> [6] Lookup name
> [4] Lookup name
> [3] Lookup name
> MPI_Lookup_name succeeded
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> [1] Lookup name
> [7] Lookup name
> MPI_Lookup_name succeeded
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> MPI_Lookup_name succeeded
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> [5] Lookup name
> MPI_Lookup_name succeeded
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> MPI_Lookup_name succeeded
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> MPI_Lookup_name succeeded
> Bcast
> MPI_Lookup_name succeeded
> Bcast
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> Bcast complete: srv=1
> Server calling MPI_Comm_accept
> [hang -- because everyone's in accept, not connect]
>
>
>
> On Jan 7, 2011, at 4:17 AM, Bernard Secher - SFME/LGLS wrote:
>
>
>> Jeff,
>>
>> Only the processes of the program where process 0 successed to publish name, have srv=1 and then call MPI_Comm_accept.
>> The processes of the program where process 0 failed to publish name, have srv=0 and then call MPI_Comm_connect.
>>
>> That's worked like this with openmpi 1.4.1.
>>
>> Is it different whith openmpi 1.5.1 ?
>>
>> Best
>> Bernard
>>
>>
>> Jeff Squyres a écrit :
>>
>>> On Jan 5, 2011, at 10:36 AM, Bernard Secher - SFME/LGLS wrote:
>>>
>>>
>>>
>>>
>>>> MPI_Comm remoteConnect(int myrank, int *srv, char *port_name, char* service)
>>>> {
>>>> int clt=0;
>>>> MPI_Request request; /* requete pour communication non bloquante */
>>>> MPI_Comm gcom;
>>>> MPI_Status status;
>>>> char port_name_clt[MPI_MAX_PORT_NAME];
>>>>
>>>> if( service == NULL ) service = defaultService;
>>>>
>>>> /* only process of rank null can publish name */
>>>> MPI_Barrier(MPI_COMM_WORLD);
>>>>
>>>> /* A lookup for an unpublished service generate an error */
>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>>> if( myrank == 0 ){
>>>> /* Try to be a server. If there service is already published, try to be a cient */
>>>> MPI_Open_port(MPI_INFO_NULL, port_name);
>>>> printf("[%d] Publish name\n",myrank);
>>>> if ( MPI_Publish_name(service, MPI_INFO_NULL, port_name) == MPI_SUCCESS ) {
>>>> *srv = 1;
>>>> printf("[%d] service %s available at %s\n",myrank,service,port_name);
>>>> }
>>>> else if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>>> MPI_Close_port( port_name );
>>>> clt = 1;
>>>> }
>>>> else
>>>> /* Throw exception */
>>>> printf("[%d] Error\n",myrank);
>>>> }
>>>> else{
>>>> /* Waiting rank 0 publish name */
>>>> sleep(1);
>>>> printf("[%d] Lookup name\n",myrank);
>>>> if ( MPI_Lookup_name(service, MPI_INFO_NULL, port_name_clt) == MPI_SUCCESS ){
>>>> clt = 1;
>>>> }
>>>> else
>>>> /* Throw exception */
>>>> ;
>>>> }
>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>>
>>>> MPI_Bcast(srv,1,MPI_INT,0,MPI_COMM_WORLD);
>>>>
>>>>
>>>>
>>> You're broadcasting srv here -- won't everyone now have *srv==1, such that they all call MPI_COMM_ACCEPT, below?
>>>
>>>
>>>
>>>
>>>> if ( *srv )
>>>> /* I am the Master */
>>>> MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom );
>>>> else{
>>>> /* Connect to service SERVER, get the inter-communicator server*/
>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>>>> if ( MPI_Comm_connect(port_name_clt, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &gcom ) == MPI_SUCCESS )
>>>> printf("[%d] I get the connection with %s at %s !\n",myrank, service, port_name_clt);
>>>> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>>>> }
>>>>
>>>> if(myrank != 0) *srv = 0;
>>>>
>>>> return gcom;
>>>>
>>>> }
>>>>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
       _\\|//_
      (' 0 0 ')
____ooO  (_) Ooo______________________________________________________
 Bernard Sécher  DEN/DM2S/SFME/LGLS    mailto : bsecher_at_[hidden]
 CEA Saclay, Bât 454, Pièce 114        Phone  : 33 (0)1 69 08 73 78
 91191 Gif-sur-Yvette Cedex, France    Fax    : 33 (0)1 69 08 10 87
------------Oooo---------------------------------------------------
       oooO (   )
       (   ) ) /
        \ ( (_/
         \_)
Ce message électronique et tous les fichiers attachés qu'il contient
sont confidentiels et destinés exclusivement à l'usage de la personne
à laquelle ils sont adressés. Si vous avez reçu ce message par erreur,
merci d'en avertir immédiatement son émetteur et de ne pas en conserver
de copie.
This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual to whom they are addressed.
If you have received this e-mail in error please inform the sender
immediately, without keeping any copy thereof.