Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] High Checkpoint Overhead Ratio
From: Joshua Hursey (jjhursey_at_[hidden])
Date: 2010-08-31 08:25:46


Have you tried testing without using the NFS? So setting the mca-params.conf to something like:
crs_base_snapshot_dir=/tmp/
snapc_base_global_snapshot_dir=/tmp/global
snapc_basee_store_in_place=0

This would remove the NFS time from the checkpoint time. However if you are using staging this may or may not reduce the application overhead significantly.

If you want to save to NFS, and it is globally mounted you could try setting the 'snapc_base_global_shared' parameter (deprecated in the trunk) which tells the system to use standard UNIX copy commands (i.e., cp) instead of the rsh varieties.

You might try changing the '--mca filem_rsh_max_incomming' parameter (default 10) to increase or decrease the number of concurrent rcp/scp operations.

Something else to try is to look at the SnapC timing to pinpoint where the system is taking the most time:
  snapc_full_enable_timing=1

Dince you are using the C/R thread, it takes up some CPU cycles that may interfere with application performance. You can adjust the agressiveness of this thread by adjusting the 'opal_cr_thread_sleep_wait' parameter. In 1.5.0 it defaults to 0 microseconds, but on the trunk this has been adjusted to 1000 microseconds. Try setting the parameter:
  opal_cr_thread_sleep_wait=1000

Depending on how much memory is required by CG.C and available on each node, you may be hitting a memory barrier that BLCR is struggling to overcome. What happens if you reduce the number of processes per node?

Those are some things to play around with to see what works best for your system and application. For a full list of parameters available in the C/R infrastructure see the link below:
  http://osl.iu.edu/research/ft/ompi-cr/api.php

-- Josh

On Aug 30, 2010, at 11:08 PM, ³ÂÎÄºÆ wrote:

> Dear OMPI Users,
>
> I¡¯m now using BLCR-0.8.2 and OpenMPI-1.5rc5. The problem is that it takes a very long time to checkpoint.
>
> BLCR configuration:
> ./onfigure --prefix=/opt/blcr --enable-static
> OpenMPi configuration:
> ./configure --prefix=/opt/ompi --with-ft=cr --with-blcr=/opt/blcr --enable-static --enable-ft-thread --enable-mpi-threads
>
> Our blades use NFS. $HOME and /opt are shared.
>
> In $HOME/.opnempi/mca-params.conf:
> crs_base_snapshot_dir=/tmp/
> snapc_base_global_snapshot_dir=/home/chenwh
> snapc_basee_store_in_place=0
>
>
> Now I run CG NPB (NPROCS=16, CLASS=C) on two nodes (blade02, blade04).
> With no checkpoint, 'Time in seconds' is about 100s. It's normal.
> But when I take a single checkpoint, 'Time in seconds' is up to 300s. The overhead ratio is over 200%! WHY? How can I improve it?
>
> blade02:~> ompi-checkpoint --status 27115
> [blade02:27130] [ 0.00 / 0.25] Requested - ...
> [blade02:27130] [ 0.00 / 0.25] Pending - ...
> [blade02:27130] [ 0.21 / 0.46] Running - ...
> [blade02:27130] [221.25 / 221.71] Finished - ompi_global_snapshot_27115.ckpt
> Snapshot Ref.: 0 ompi_global_snapshot_27115.ckpt
>
> As you see, it takes 200+ secconds to checkpoint. btw, what the former and latter number represent in [ , ]?
>
> Regards
>
> Whchen
> <ATT00001..txt>

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey