Jeff, you were right. I did a series of Spawns and consecutive Merges and forgot to set the exception handler with the newly created intra-communicators. Since these properties obviously are not inherited (which would be kind of hard considering that there are multiple communicators to be merged), the default non-exception-throwing handler was installed.

Thanks!

Murat


Jeff Squyres schrieb:
On Nov 7, 2007, at 7:43 PM, Murat Knecht wrote:

  
when MPI_Spawn cannot launch an application for whatever reason, the
entire job is cancelled with some message like the following.
    

That is correct; MPI states that the default error handler is  
MPI_ERRORS_ABORT.

  
Is there a way to handle this nicely, e.g. by throwing an exception? I
    

Sure; change the default error handler on the communicator in which  
you are using in the call to COMM_SPAWN.

I don't know if we have checked this particular code path to ensure  
that OMPI will be stable after this, but it might work...

  
understand, this does not work, when the job is first started with
mpirun, as there is no application yet to fall back on, but in case  
of a
running application, it should be possible to simply inform it that  
the
spawning request failed. Then the application could begin to handle  
the
error and terminate gracefully. I did enable C++ Exceptions btw, so I
guess this is not implemented. Is there a technical (e.g.  
architectural)
reason behind this, or simply a yet-to-be-added feature?
    

The MPI layer is written in C; it will not throw exceptions unless you  
use the MPI C++ bindings to enable the MPI::ERRORS_THROW_EXCEPTIONS  
error handler.  Also be sure to use the right compiler flags to enable  
the C compiler to propagate C++ exceptions when you configure/build  
Open MPI via the --enable-cxx-exceptions flag (it's not enabled by  
default because it imposes a slight performance penalty).