Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2007-03-22 10:30:59


LAM/MPI was able to checkpoint/restart an entire MPI job as you
mention. Open MPI is now able to checkpoint/restart as well. In the
past week I added to the Open MPI trunk a LAM/MPI-like checkpoint/
restart implementation. In Open MPI we revisited many of the design
decisions from the LAM/MPI development and improved on them quite a
bit. At the moment there is no documentation on how to use it (egg on
my face actually). I'm working on developing the documentation, and I
will send a note to the users list once it is available.

Cheers,
Josh

On Mar 21, 2007, at 1:18 PM, Thomas Spraggins wrote:

> To migrate processes, you need to be able to checkpoint them. I
> believe that LAM-MPI is the only MPI implementation that allows this,
> although I have never used LAM-MPI.
>
> Good luck.
>
> Tom Spraggins
> tas_at_[hidden]
>
> On Mar 21, 2007, at 1:09 PM, Mohammad Huwaidi wrote:
>
>> Hello folks,
>>
>> I am trying to write some fault-tolerance systems with the
>> following criteria:
>> 1) Recover any software/hardware crashes
>> 2) Dynamically Shrink and grow.
>> 3) Migrate processes among machines.
>>
>> Does anyone has examples of code? What MPI platform is recommended
>> to accomplish such requirements?
>>
>> I am using three MPI platforms and each has it own issues:
>> 1) MPICH2 - good multi-threading support, but bad fault-tolerance
>> mechanisms.
>> 2) OpenMPI - Does not support multi-threading properly and cannot
>> have it trap exceptions yet.
>> 3) FT-MPI - Old and does not support multi-threading at all.
>>
>> Any suggestions?
>> --
>>
>> Regards,
>> Mohammad Huwaidi
>>
>> We can't resolve problems by using the same kind of thinking we used
>> when we created them.
>> --Albert Einstein
>> <mohammad.vcf>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/