Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question on checkpoint overhead in Open MPI
From: Nguyen Toan (nguyentoan1508_at_[hidden])
Date: 2010-07-22 13:07:10


Dear Josh,
Thank you very much for the reply. I am sorry if my question was unclear, so
please let me organize my question again.
Currently I am applying the staging technique with the mca-params.conf
setting as follows:
snapc_base_store_in_place=0 # enable remote file transfer to global storage
crs_base_snapshot_dir=/ssd/tmp/ckpt/local
snapc_base_global_snapshot_dir=/ssd/tmp/ckpt/global

and I am concerned with the amount "Others" = checkpoint latency -
checkpoint overhead.
According to your answer, remote file transfer is done asynchronously while
the application continues execution.
>From my observation the overhead of "Others" increases greatly when the data
size and the number of processes increases. So is the time of scp for file
transferring to stable storage included mainly in "Others"?
As you said the amount of checkpoint overhead is application and system
configuration specific but in general is there any relationship between
"Others" and the number of processes or data size?
Thank you.

Best Regards,
Nguyen Toan

On Sat, Jul 17, 2010 at 6:25 AM, Josh Hursey <jjhursey_at_[hidden]> wrote:

> The amount of checkpoint overhead is application and system configuration
> specific. So it is impossible to give you a good answer to how much
> checkpoint overhead to expect for your application and system setup.
>
> BLCR is only used to capture the single process image. The coordination of
> the distributed checkpoint includes:
> - the time to initiate the checkpoint,
> - the time to marshall the network (we currently use an all-to-all
> bookmark exchange, similar to to what LAM/MPI used),
> - Store the local checkpoints to stable storage,
> - Verify that all of the local checkpoints have been stored successfully,
> and
> - Return the handle to the end user.
>
> The bulk of the time is spent saving the local checkpoints (a.k.a.
> snapshots) to stable storage. By default Open MPI saves directly to a
> globally mounted storage device. So the application is stalled until the
> checkpoint is complete (checkpoint overhead = checkpoint latency). You can
> also enable checkpoint staging in which the application saves the checkpoint
> to a local disk. After which the local daemon stages the file back to stable
> storage while the application continues execution (checkpoint overhead <<
> checkpoint latency).
>
> If you are concerned with scaling, definitely look at the staging
> technique.
>
> Does that help?
>
> -- Josh
>
> On Jul 7, 2010, at 12:25 PM, Nguyen Toan wrote:
>
> > Hello everyone,
> > I have a question concerning the checkpoint overhead in Open MPI, which
> is the difference taken from the runtime of application execution with and
> without checkpoint.
> > I observe that when the data size and the number of processes increases,
> the runtime of BLCR is very small compared to the overall checkpoint
> overhead in Open MPI. Is it because of the increase of coordination time for
> checkpoint? And what is included in the overall checkpoint overhead besides
> the BLCR's checkpoint overhead and coordination time?
> > Thank you.
> >
> > Best Regards,
> > Nguyen Toan
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>