Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Proper way to throw an error to all nodes?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-04 14:51:13

Yes -- MPI_Abort is the simplest way to get them all to die. But
you'll also get error message(s) from OMPI. So you have [at least] 2

1. Exit with MPI error

   if (rank == process_who_does_the_checking && !exists(filename)) {

2. Exit with your own error; MPI finalizes cleanly

   file_exists = 1;
   if (rank == process_who_does_the_checking && !exists(filename)) {
      file_exists = 0;
   MPI_Bcast(&file_exists, 1, MPI_INT, process_who_does_the_checking,
   if (!file_exists) {

There's oodles of variants on this, of course, but you get the general

On Jun 3, 2008, at 11:00 PM, David Singleton wrote:

> This is exactly what MPI_Abort is for.
> David
> Terry Frankcombe wrote:
>> Calling MPI_Finalize in a single process won't ever do what you want.
>> You need to get all the processes to call MPI_Finalize for the end
>> to be
>> graceful.
>> What you need to do is have some sort of special message to tell
>> everyone to die. In my codes I have a rather dynamic master-slave
>> model
>> with flags being broadcast by the master process to tell the slaves
>> what
>> to expect next, so it's easy for me to send out an "it's all over,
>> please kill yourself" message. For a more rigid communication
>> pattern
>> you could embed the die message in the data: something like if the
>> first
>> element of the received data is negative, then that's the sign things
>> have gone south and everyone should stop what they're doing and
>> MPI_Finalize. The details depend on the details of your code.
>> Presumably you could also set something up using tags and message
>> polling.
>> Hope this helps.
>> On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc902_at_[hidden] wrote:
>>> So I'm working on this program which has many ways it might
>>> possibly die
>>> at runtime, but one of them that happens frequently is the user
>>> types a
>>> wrong (non-existant) filename on the command prompt. As it is now,
>>> the
>>> node looking for the file notices the file doesn't exist and tries
>>> to
>>> terminate the program. It tries to call MPI_Finalize(), but the
>>> other
>>> nodes are all waiting for a message from the node doing the file
>>> reading, so MPI_Finalize waits forever until the user realizes the
>>> job
>>> isn't doing anything and terminates it manually.
>>> So, my question is: what's the "correct" graceful way to handle
>>> situations like this? Is there some MPI function which can basically
>>> throw an exception to all other nodes telling them bail out now?
>>> Or is
>>> correct behaviour just to have the node that spotted the error die
>>> quietly and wait for the others to notice?
>>> Thanks for any suggestions!
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems