Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-04-20 11:12:07

On Apr 19, 2006, at 4:15 PM, Greg Watson wrote:

> We've just run across a rather tricky issue. We're calling
> opal_event_loop() to dispatch orte events to an orted that has been
> launched separately. However if the orted dies for some reason (gets
> a signal or whatever) then opal_event_loop() is calling exit().
> Needless to say, this is not good behavior us. Any suggestions on how
> to get around this problem?

Is the orted you are connecting to the "seed" daemon? I think the
only time we should be exiting like that is if the orted was the seed
daemon. I'm not sure what we want to do if that's the case -- it
looks like we're calling errmgr.abort() when badness happens. I
wonder if your application can provide its own errmgr component that
provides an abort that doesn't actually abort? Just some off the
cuff ideas -- Ralph could probably give a better idea of exactly what
is happening...


   Brian Barrett
   Open MPI developer