Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: CRS Module for MTCP Checkpointing Package
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-10-10 11:39:31


On Oct 7, 2011, at 21:44 , Alex Brick wrote:

> I'm a little unclear on this comment.

My comment was about the generic way the entire problem was stated. DMTCP doesn't support checkpointing of Open MPI applications. Instead the correct answer is "DMTCP supports checkpointing of Open MPI applications running on top of TCP". Nothing bad about this, I totally understand the challenges.

> DMTCP currently supports checkpointing and restoring sockets over TCP,

Open MPI create the sockets in a lazy fashion. In other words a connection between two peers does not exist before the two exchange some data (any MPI messaging will do it). Thus, there is no socket to be restored here … yet. What it is unclear in the entire documentation of DMTCP, and in all the papers I read about, is how exactly the connection process is supposed to work after restart. When you restart a process, even on the same node, there is no guarantee that you can get the same port? Do you hook on the socket creation function to replace at runtime the hostname and port number of the destination process?

> ... However, we feel that value is added by also working as an Open MPI module, where Open MPI handles all of the network communication, and our module simply handles checkpointing the individual processes. This enables people to use our user-level checkpointing tools with other networks by using Open MPI.

This is a great addition to Open MPI. But if this is what you plan to add to Open MPI, make an RFC about this instead of an RFC filled with text about DMTCP.

  george.

>
> What exactly is your question?
>
>
> -- Alex
>
> George Bosilca <bosilca_at_[hidden]> wrote:
>
>> Way too much hands waving here.
>>
>> When you say certain networks you mean TCP and potentially SM. However, I doubt even TCP can be fully supported. Not without the preconnect option … or a mean to update the modes information.
>>
>> george.
>>
>> On Oct 7, 2011, at 14:56 , Josh Hursey wrote:
>>
>>>> From what I have seen during development, this RFC integrates the MTCP
>>> single process checkpointer into the C/R infrastructure of Open MPI.
>>> The MTCP component of the DMTCP project can be used in insolation,
>>> which is what they are integrating. So they can use DMTCP to
>>> checkpoint/restart an unmodified Open MPI, but only over certain
>>> networks. By integrating the MTCP checkpointer as a CRS component they
>>> use Open MPI to coordinate across processes, and gain support for a
>>> larger number of networks (e.g., IB, MX).
>>>
>>> Alex, does that sound about right?
>>>
>>> -- Josh
>>>
>>>
>>> On Thu, Oct 6, 2011 at 4:33 PM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>> Alex,
>>>>
>>>> It looks like there is a mismatch between what you propose to achieve and the text in your RFC. You propose to add a new single-process checkpoint-restart mechanism (MTCP), to the ones already provided in Open MPI. However, most of the text in your RFC is about DMTCP, which is another layer on top of MTCP capable of checkpoint/restarting distributed application.
>>>>
>>>> I would like to understand what this RFC is really about: MTCP or DMTCP?
>>>>
>>>> george.
>>>>
>>>> On Oct 6, 2011, at 02:58 , Alex Brick wrote:
>>>>
>>>>> WHAT: Bring in the mtcp CRS component
>>>>>
>>>>> WHY: Add support for the MTCP checkpoint/restart service
>>>>>
>>>>> WHERE: opal/mca/crs/mtcp
>>>>>
>>>>> TIMEOUT: Tuesday teleconf, 2011-10-18 (about 2 weeks from now)
>>>>>
>>>>> -------------------------------------------
>>>>> What is MTCP?
>>>>>
>>>>> DMTCP (Distributed MultiThreaded CheckPointing, http://dmtcp.sourceforge.net) is a mature open source (LGPL) checkpointing package that has been under development for seven years. It operates entirely in user space, with no kernel modules, or modifications to the target application. If used in the simplest possible way, it works as:
>>>>>
>>>>> dmtcp_checkpoint ./a.out
>>>>> dmtcp_command --checkpoint
>>>>> dmtcp_restart ckpt_a.out_*.dmtcp
>>>>>
>>>>> DMTCP is contagious. Any calls to fork(), pthread_create(), or "ssh",
>>>>> are recognized by DMTCP, and it maintains those threads, and local and
>>>>> remote processes under checkpoint control. At checkpoint time, it also
>>>>> generates a script, dmtcp_restart_script.sh, that can restart a distributed computation. As a sign of its maturity, it can also checkpoint Open MPI "from on top": dmtcp_checkpoint mpirun hello_mpi
>>>>>
>>>>> The MTCP component of DMTCP is the single-process component. It is used
>>>>> both internally by DMTCP as well as directly by users only interested in
>>>>> checkpointing a single process. This second feature was used in order to develop an Open MPI module for the Open MPI checkpoint-restart service similar to BLCR, except that no kernel modules are required.
>>>>>
>>>>> DMTCP is currently a Debian package (Debian testing), and is planned also for Fedora and openSuSe. These packages also provide the MTCP component for Open MPI.
>>>>>
>>>>> -------------------------------------------
>>>>> More details:
>>>>>
>>>>> Open MPI MTCP integration implementation available at:
>>>>>
>>>>> https://bitbucket.org/jsquyres/ompi-dmtcp2
>>>>>
>>>>> The DMTCP parent project website is below:
>>>>>
>>>>> http://dmtcp.sourceforge.net/
>>>>>
>>>>> The Distributed MultiThreaded CheckPointing (DMTCP) Project supports user-level, transparent checkpoint/restart of a variety of sequential and parallel programs. In Open MPI terms, this contribution is an alternative to the BLCR CRS module, meaning that users can use DMTCP to checkpoint their applications instead of BLCR.
>>>>>
>>>>> The MTCP component is currently restricted to supporting communication over sockets and shared memory. In an effort to support a wider range of networks (e.g., InfiniBand, Myrinet), they have created a CRS component to hook into Open MPI's checkpoint/restart infrastructure. The MTCP user-level checkpoint/restart service is the single process checkpoint kernel of the DMTCP project. The MTCP kernel is what is used in the mtcp CRS component.
>>>>>
>>>>> Jeff Squyres and Josh Hursey have been working with the DMTCP authors (based out of the US Northeastern University in Boston, MA, USA) for quite a while and feel that this component is ready to be brought into the Open MPI main line for inclusion in the 1.7.x series (and possibly the 1.5.x series?). The authors have submitted OMPI 3rd party contribution agreements.
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel