On Jan 7, 2011, at 10:41 AM, Bernard Secher - SFME/LGLS wrote:
> srv = 0 is set in my main program
> I call Bcast because all the processes must call MPI_Comm_accept (collective) or must call MPI_Comm_connect (collective)
Ah -- I see. I thought this was a test program where some processes were supposed to call connect and others were supposed to call accept.
> Anyway, I get also a dead lock with your lookup program:
>
> That's what I do:
>
> ompi-server -r URIfile
>
> mpirun -np 1 -ompi-server file:URIfile lookup& (it the program which publish the name)
> mpirun -np 1 -ompi-server file:URIfile lookup (it is the program which lookup the name)
>
> >From these two programs I create a global communicator to exchange communications between the two others
Ah -- this is a key point that I missed in your intial mail: that you're using the ompi server and multiple different mpirun's. :-)
Ok, I can replicate the hang in publish now. I'll file a bug report.
--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
|