On Nov 7, 2007, at 7:43 PM, Murat Knecht wrote:
> when MPI_Spawn cannot launch an application for whatever reason, the
> entire job is cancelled with some message like the following.
That is correct; MPI states that the default error handler is
MPI_ERRORS_ABORT.
> Is there a way to handle this nicely, e.g. by throwing an exception? I
Sure; change the default error handler on the communicator in which
you are using in the call to COMM_SPAWN.
I don't know if we have checked this particular code path to ensure
that OMPI will be stable after this, but it might work...
> understand, this does not work, when the job is first started with
> mpirun, as there is no application yet to fall back on, but in case
> of a
> running application, it should be possible to simply inform it that
> the
> spawning request failed. Then the application could begin to handle
> the
> error and terminate gracefully. I did enable C++ Exceptions btw, so I
> guess this is not implemented. Is there a technical (e.g.
> architectural)
> reason behind this, or simply a yet-to-be-added feature?
The MPI layer is written in C; it will not throw exceptions unless you
use the MPI C++ bindings to enable the MPI::ERRORS_THROW_EXCEPTIONS
error handler. Also be sure to use the right compiler flags to enable
the C compiler to propagate C++ exceptions when you configure/build
Open MPI via the --enable-cxx-exceptions flag (it's not enabled by
default because it imposes a slight performance penalty).
--
Jeff Squyres
Cisco Systems
|