Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Proper way to throw an error to all nodes?
From: Terry Frankcombe (terry_at_[hidden])
Date: 2008-06-03 22:08:18


Calling MPI_Finalize in a single process won't ever do what you want.
You need to get all the processes to call MPI_Finalize for the end to be
graceful.

What you need to do is have some sort of special message to tell
everyone to die. In my codes I have a rather dynamic master-slave model
with flags being broadcast by the master process to tell the slaves what
to expect next, so it's easy for me to send out an "it's all over,
please kill yourself" message. For a more rigid communication pattern
you could embed the die message in the data: something like if the first
element of the received data is negative, then that's the sign things
have gone south and everyone should stop what they're doing and
MPI_Finalize. The details depend on the details of your code.

Presumably you could also set something up using tags and message
polling.

Hope this helps.

On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc902_at_[hidden] wrote:
> So I'm working on this program which has many ways it might possibly die
> at runtime, but one of them that happens frequently is the user types a
> wrong (non-existant) filename on the command prompt. As it is now, the
> node looking for the file notices the file doesn't exist and tries to
> terminate the program. It tries to call MPI_Finalize(), but the other
> nodes are all waiting for a message from the node doing the file
> reading, so MPI_Finalize waits forever until the user realizes the job
> isn't doing anything and terminates it manually.
>
> So, my question is: what's the "correct" graceful way to handle
> situations like this? Is there some MPI function which can basically
> throw an exception to all other nodes telling them bail out now? Or is
> correct behaviour just to have the node that spotted the error die
> quietly and wait for the others to notice?
>
> Thanks for any suggestions!

-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509    Skype: terry.frankcombe