On Jun 29, 2010, at 9:35 PM, Íõî£ wrote:
> Thanks for the feedback. More below:
> Is there any MPI implementions which meet the following requirements:
> 1, it doesn't terminate the whole job when a node is dead?
> 2, it allows the spare node to replace the dead node and take over the work of the dead node?
> As far as I know, FT-MPI meets the two requirements, but it hasn't updated since 2004. Open-mpi is said to combine serveral projects including FT-MPI, but so far, it only provides checkpoinr/restart as a way of fault-tolerance.
I know that the UT people have been working on such things over the past few years, but I don't know the current status.
For corporate legal information go to: