Victor you might want to take a look at the Open MPI version available from http://fault-tolerance.org/. It provides additional features to graciously handle node failures.
On May 30, 2013, at 17:55 , Victor Vysotskiy <Victor.Vysotskiy_at_[hidden]> wrote:
> Hi Ralph,
>> -mca orte_abort_non_zero_exit 0
> Thank you for the hint. That it is exactly what I need! BTW, does it help if one of the working node occasionally dies during the MPMD run?
> With best regards,
> users mailing list