Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Proper way to throw an error to all nodes?
From: Robert Kubrick (robertkubrick_at_[hidden])
Date: 2008-08-18 20:16:15


A question related to an old thread:
in case of solution 2), how do you broadcast 'flags' to the slaves if
they're processing asynchronous data? I understand MPI_Bcast is a
collective operation requiring all processes in a communicator to
call it before it completes. If the slaves are processing a number of
data events in a continuous loop, the only solution I see is to send
a special exit message from the master through MPI_Send.

Or is there a non-collective broadcast function I am missing?

On Jun 4, 2008, at 2:51 PM, Jeff Squyres wrote:

> Yes -- MPI_Abort is the simplest way to get them all to die. But
> you'll also get error message(s) from OMPI. So you have [at least] 2
> options:
>
> 1. Exit with MPI error
>
> -----
> if (rank == process_who_does_the_checking && !exists(filename)) {
> print("bad!");
> MPI_Abort(MPI_COMM_WORLD);
> }
> -----
>
> 2. Exit with your own error; MPI finalizes cleanly
>
> -----
> file_exists = 1;
> if (rank == process_who_does_the_checking && !exists(filename)) {
> print("bad!");
> file_exists = 0;
> }
> MPI_Bcast(&file_exists, 1, MPI_INT, process_who_does_the_checking,
> MPI_COMM_WORLD);
> if (!file_exists) {
> MPI_Finalize();
> exit(1);
> }
> -----
>
> There's oodles of variants on this, of course, but you get the general
> idea.
>
>
>
> On Jun 3, 2008, at 11:00 PM, David Singleton wrote:
>
>>
>> This is exactly what MPI_Abort is for.
>>
>> David
>>
>> Terry Frankcombe wrote:
>>> Calling MPI_Finalize in a single process won't ever do what you
>>> want.
>>> You need to get all the processes to call MPI_Finalize for the end
>>> to be
>>> graceful.
>>>
>>> What you need to do is have some sort of special message to tell
>>> everyone to die. In my codes I have a rather dynamic master-slave
>>> model
>>> with flags being broadcast by the master process to tell the slaves
>>> what
>>> to expect next, so it's easy for me to send out an "it's all over,
>>> please kill yourself" message. For a more rigid communication
>>> pattern
>>> you could embed the die message in the data: something like if the
>>> first
>>> element of the received data is negative, then that's the sign
>>> things
>>> have gone south and everyone should stop what they're doing and
>>> MPI_Finalize. The details depend on the details of your code.
>>>
>>> Presumably you could also set something up using tags and message
>>> polling.
>>>
>>> Hope this helps.
>>>
>>>
>>> On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc902_at_[hidden] wrote:
>>>> So I'm working on this program which has many ways it might
>>>> possibly die
>>>> at runtime, but one of them that happens frequently is the user
>>>> types a
>>>> wrong (non-existant) filename on the command prompt. As it is now,
>>>> the
>>>> node looking for the file notices the file doesn't exist and tries
>>>> to
>>>> terminate the program. It tries to call MPI_Finalize(), but the
>>>> other
>>>> nodes are all waiting for a message from the node doing the file
>>>> reading, so MPI_Finalize waits forever until the user realizes the
>>>> job
>>>> isn't doing anything and terminates it manually.
>>>>
>>>> So, my question is: what's the "correct" graceful way to handle
>>>> situations like this? Is there some MPI function which can
>>>> basically
>>>> throw an exception to all other nodes telling them bail out now?
>>>> Or is
>>>> correct behaviour just to have the node that spotted the error die
>>>> quietly and wait for the others to notice?
>>>>
>>>> Thanks for any suggestions!
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users