Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Proper way to throw an error to all nodes?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-04 14:51:13


Yes -- MPI_Abort is the simplest way to get them all to die. But
you'll also get error message(s) from OMPI. So you have [at least] 2
options:

1. Exit with MPI error

-----
   if (rank == process_who_does_the_checking && !exists(filename)) {
      print("bad!");
      MPI_Abort(MPI_COMM_WORLD);
   }
-----

2. Exit with your own error; MPI finalizes cleanly

-----
   file_exists = 1;
   if (rank == process_who_does_the_checking && !exists(filename)) {
      print("bad!");
      file_exists = 0;
   }
   MPI_Bcast(&file_exists, 1, MPI_INT, process_who_does_the_checking,
MPI_COMM_WORLD);
   if (!file_exists) {
      MPI_Finalize();
      exit(1);
   }
-----

There's oodles of variants on this, of course, but you get the general
idea.

On Jun 3, 2008, at 11:00 PM, David Singleton wrote:

>
> This is exactly what MPI_Abort is for.
>
> David
>
> Terry Frankcombe wrote:
>> Calling MPI_Finalize in a single process won't ever do what you want.
>> You need to get all the processes to call MPI_Finalize for the end
>> to be
>> graceful.
>>
>> What you need to do is have some sort of special message to tell
>> everyone to die. In my codes I have a rather dynamic master-slave
>> model
>> with flags being broadcast by the master process to tell the slaves
>> what
>> to expect next, so it's easy for me to send out an "it's all over,
>> please kill yourself" message. For a more rigid communication
>> pattern
>> you could embed the die message in the data: something like if the
>> first
>> element of the received data is negative, then that's the sign things
>> have gone south and everyone should stop what they're doing and
>> MPI_Finalize. The details depend on the details of your code.
>>
>> Presumably you could also set something up using tags and message
>> polling.
>>
>> Hope this helps.
>>
>>
>> On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc902_at_[hidden] wrote:
>>> So I'm working on this program which has many ways it might
>>> possibly die
>>> at runtime, but one of them that happens frequently is the user
>>> types a
>>> wrong (non-existant) filename on the command prompt. As it is now,
>>> the
>>> node looking for the file notices the file doesn't exist and tries
>>> to
>>> terminate the program. It tries to call MPI_Finalize(), but the
>>> other
>>> nodes are all waiting for a message from the node doing the file
>>> reading, so MPI_Finalize waits forever until the user realizes the
>>> job
>>> isn't doing anything and terminates it manually.
>>>
>>> So, my question is: what's the "correct" graceful way to handle
>>> situations like this? Is there some MPI function which can basically
>>> throw an exception to all other nodes telling them bail out now?
>>> Or is
>>> correct behaviour just to have the node that spotted the error die
>>> quietly and wait for the others to notice?
>>>
>>> Thanks for any suggestions!
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems