Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] dead lock in MPI_Finalize
From: George Bosilca (bosilca_at_[hidden])
Date: 2009-01-23 13:59:37


I was somehow confused when I wrote my last email and I mixed up the
MPI versions (thanks to Dick Treumann for gently pointing me to the
truth). Before MPI 2.1, the MPI Standard was unclear how the
MPI_Finalize should behave in the context of spawned or joined worlds,
which make the disconnect+finalize the only safe and portable way to
correctly finalize all processes connected. However, the MPI 2.1 had
clarified this point, and now MPI_Finalize is collective over all
connected processes (for a definition of connected processes please
see the MPI 2.1 10.5 page 318).

However, if you really want to write a portable MPI application, I
suggest to use the disconnect+finalize, at least until all MPI
libraries available are 2.1 compliant.

Open MPI 1.3 version was supposed to be 2.1 compliant, so I guess I'll
have to create a new bug report for this.

   Thanks,
     george.

On Jan 23, 2009, at 10:02 , George Bosilca wrote:

> I don't know what your program is doing but I kind of guess what
> the =
>
> problem is. If you use MPI 2 dynamics to spawn or connect two =
>
> MPI_COMM_WORLD you have to disconnect them before calling =
>
> MPI_Finalize. The reason is that an MPI_Finalize do the opposite of
> an =
>
> MPI_Init, so it is MPI_COMM_WORLD based. Make sure your different =
>
> world are disconnected before doing the MPI_Finalize should solve
> the =
>
> problem.
>
> george.
>
> On Jan 23, 2009, at 06:00 , Bernard Secher - SFME/LGLS wrote:
>
>> No i didn't run this program whith Open-MPI 1.2.X because one said =
>
>> to me there were many changes between 1.2.X version and 1.3
>> version =
>
>> about MPI_publish_name, MPI_Lookup_name (new ompi-server, ...),
>> and =
>
>> it was better to use 1.3 version.
>>
>> Yes i am sure all processes reach MPI_Finalize() function because
>> i =
>
>> write message just before (it is the END_OF macro in my program), =
>
>> and i am sure all processes are locked in MPI_Finalize() function =
>
>> beacause i write message just after (it is the MESSAGE macro).
>>
>> May be all MPI_Sends are not matched by corresponding
>> MPI_Recvs,... =
>
>> It can be a possibility.
>>
>> Thanks
>> Bernard
>>
>>
>>
>> jody a =E9crit :
>>> Hi Bernard
>>>
>>> The structure looks as far as i can see.
>>> Did it run OK on Open-MPI 1.2.X?
>>> So are you sure all processes reach the MPI_Finalize command?
>>> Usually MPI_Finalize only completes when all processes reach it.
>>> I think you should also make sure that all MPI_Sends are matched by
>>> corresponding MPI_Recvs.
>>>
>>> Jody
>>>
>>> On Fri, Jan 23, 2009 at 11:08 AM, Bernard Secher - SFME/LGLS
>>> <bernard.secher_at_[hidden]> wrote:
>>>
>>>> Thanks Jody for your answer.
>>>>
>>>> I launch 2 instances of my program on 2 processes each instance, =
>
>>>> on the same
>>>> machine.
>>>> I use MPI_Publish_name, MPI_Lookup_name to create a global =
>
>>>> communicator on
>>>> the 4 processes.
>>>> Then the 4 processes exchange data.
>>>>
>>>> The main program is a CORBA server. I send you this program.
>>>>
>>>> Bernard
>>>>
>>>> jody a =E9crit :
>>>>
>>>> For instance:
>>>> - how many processes on how many machines,
>>>> - what kind of computation
>>>> - perhaps minimal code which reproduces this failing
>>>> - configuration settings, etc.
>>>> See: http://www.open-mpi.org/community/help/
>>>>
>>>> Without any information except for "it doesn't work",
>>>> nobody can give you any help whatsoever.
>>>>
>>>> Jody
>>>>
>>>> On Fri, Jan 23, 2009 at 9:33 AM, Bernard Secher - SFME/LGLS
>>>> <bernard.secher_at_[hidden]> wrote:
>>>>
>>>>
>>>> Hello Jeff,
>>>>
>>>> I don't understand what you mean by "A _detailed_ description of =
>
>>>> what is
>>>> failing".
>>>> The problem is a dead lock in MPI_Finalize() function. All =
>
>>>> processes are
>>>> blocked in this MPI_Finalize() function.
>>>>
>>>> Bernard
>>>>
>>>> Jeff Squyres a =E9crit :
>>>>
>>>>
>>>> Per this note on the "getting help" page, we still need the =
>
>>>> following:
>>>>
>>>> "A _detailed_ description of what is failing. The more details =
>
>>>> that you
>>>> provide, the better. E-mails saying "My application doesn't
>>>> work!" =
>
>>>> will
>>>> inevitably be answered with requests for more information about =
>
>>>> exactly what
>>>> doesn't work; so please include as much information detailed in =
>
>>>> your initial
>>>> e-mail as possible."
>>>>
>>>> Additionally:
>>>>
>>>> "The best way to get help is to provide a "recipie" for =
>
>>>> reproducing the
>>>> problem."
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On Jan 22, 2009, at 8:53 AM, Bernard Secher - SFME/LGLS wrote:
>>>>
>>>>
>>>>
>>>> Hello Tim,
>>>>
>>>> I send you the information in join files.
>>>>
>>>> Bernard
>>>>
>>>> Tim Mattox a =E9crit :
>>>>
>>>>
>>>> Can you send all the information listed here:
>>>>
>>>> http://www.open-mpi.org/community/help/
>>>>
>>>> On Wed, Jan 21, 2009 at 8:58 AM, Bernard Secher - SFME/LGLS
>>>> <bernard.secher_at_[hidden]> wrote:
>>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>> I have a case wher i have a dead lock in MPI_Finalize() function =
>
>>>> with
>>>> openMPI v1.3.
>>>>
>>>> Can some body help me please?
>>>>
>>>> Bernard
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> _\\|//_
>>>> (' 0 0 ')
>>>> ____ooO (_) =
>
>>>> Ooo______________________________________________________
>>>> Bernard S=E9cher DEN/DM2S/SFME/LGLS mailto : bsecher_at_[hidden]
>>>> CEA Saclay, B=E2t 454, Pi=E8ce 114 Phone : 33 (0)1 69 08
>>>> 73 78
>>>> 91191 Gif-sur-Yvette Cedex, France Fax : 33 (0)1 69 08 10 87
>>>> ------------Oooo---------------------------------------------------
>>>> oooO ( )
>>>> ( ) ) /
>>>> \ ( (_/
>>>> \_)
>>>>
>>>>
>>>> Ce message =E9lectronique et tous les fichiers attach=E9s qu'il
>>>> contient
>>>> sont confidentiels et destin=E9s exclusivement =E0 l'usage de la =
>
>>>> personne
>>>> =E0 laquelle ils sont adress=E9s. Si vous avez re=E7u ce message
>>>> par =
>
>>>> erreur,
>>>> merci d'en avertir imm=E9diatement son =E9metteur et de ne pas
>>>> en =
>
>>>> conserver
>>>> de copie.
>>>>
>>>> This e-mail and any files transmitted with it are confidential and
>>>> intended solely for the use of the individual to whom they are =
>
>>>> addressed.
>>>> If you have received this e-mail in error please inform the sender
>>>> immediately, without keeping any copy thereof.
>>>>
>>>>
>>>> < =
>
>>>> config =
>
>>>> .log =
>
>>>> .tgz =
>
>>>>> =
>
>>>> < =
>
>>>> ifconfig =
>
>>>> .log =
>
>>>> .tgz =
>
>>>>> <ompi_info.log.tgz>_______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>>
>>>> _\\|//_
>>>> (' 0 0 ')
>>>> ____ooO (_) =
>
>>>> Ooo______________________________________________________
>>>> Bernard S=E9cher DEN/DM2S/SFME/LGLS mailto : bsecher_at_[hidden]
>>>> CEA Saclay, B=E2t 454, Pi=E8ce 114 Phone : 33 (0)1 69 08
>>>> 73 78
>>>> 91191 Gif-sur-Yvette Cedex, France Fax : 33 (0)1 69 08 10 87
>>>> ------------Oooo---------------------------------------------------
>>>> oooO ( )
>>>> ( ) ) /
>>>> \ ( (_/
>>>> \_)
>>>>
>>>>
>>>> Ce message =E9lectronique et tous les fichiers attach=E9s qu'il
>>>> contient
>>>> sont confidentiels et destin=E9s exclusivement =E0 l'usage de la =
>
>>>> personne
>>>> =E0 laquelle ils sont adress=E9s. Si vous avez re=E7u ce message
>>>> par =
>
>>>> erreur,
>>>> merci d'en avertir imm=E9diatement son =E9metteur et de ne pas
>>>> en =
>
>>>> conserver
>>>> de copie.
>>>>
>>>> This e-mail and any files transmitted with it are confidential and
>>>> intended solely for the use of the individual to whom they are =
>
>>>> addressed.
>>>> If you have received this e-mail in error please inform the sender
>>>> immediately, without keeping any copy thereof.
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users