Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Question about open-mpi fault tolerance machanism
From: Rui (wangraying_at_[hidden])
Date: 2010-06-08 04:37:03


Hi, I have 3 questions to ask about,

 

1, how does open-mpi find the faulty node?

 

2, if one node is dead, could the programs continue running? How about two
nodes or even more nodes are dead ?

 

3, How to recovery faulty node (dead node) ? Is there any possibilities to
recover without check-pointing, since it is time-consuming and decrease
performance ?

 

Thanks!

 

Rui Wang

 

ICT, P.R. China