I am trying to write some fault-tolerance systems with the following
1) Recover any software/hardware crashes
2) Dynamically Shrink and grow.
3) Migrate processes among machines.
Does anyone has examples of code? What MPI platform is recommended to
accomplish such requirements?
I am using three MPI platforms and each has it own issues:
1) MPICH2 - good multi-threading support, but bad fault-tolerance
2) OpenMPI - Does not support multi-threading properly and cannot have
it trap exceptions yet.
3) FT-MPI - Old and does not support multi-threading at all.
We can't resolve problems by using the same kind of thinking we used
when we created them.