Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Checkpoint problem with BLCR + OpenMPI
From: Joshua Hursey (jjhursey_at_[hidden])
Date: 2010-08-27 08:21:32

On Aug 27, 2010, at 3:52 AM, ³ÂÎÄºÆ wrote:

> Dear OMPI Users,
> I have installed BLCR(0.8.2) and OpenMPI(1.4.2) successfully. But now I met a problem when I take a checkpoint.
> I run CG NPB(NPROCS=16, two nodes: blade02 & blade04, CLASS=C, NFS: $HOME & /opt are shared)
> BLCR configure: ./configure ¨Cprefix=/opt/blcr ¨Cenable-static
> OpenMPi configure: ./configure ¨Cprefix=/opt/ompi ¨Cwith-ft=cr ¨Cwith-blcr=/opt/blcr ¨Cenable-static (I didn¡¯t add ¡®enable-ft-thread¡¯ param for I think it might affect the performance. Is it right?? And mpi-threads are enabled by default, so I didn't add ¡®enable-mpi-threads¡¯ param) And Can anyone tell me these two params will make the checkpoint time shorter or longer?
> Our blades use NFS. $HOME and /opt are shared. The checkpoint file is created in the $HOME directory by default. Will it cause the long checkpoint time???
> In $HOME/.openmpi/mca-params.conf:
> crs_base_snapshot_dir=/tmp/
> snapc_base_global_snapshot_dir=$HOME/ompi-cr-file
> snapc_base_store_in_place=0
> Then in mpirun terminal:
> mpirun -machinefile mf -am ft-enable-cr -n 8 ./cg.C.8
> In checkpoint terminal:
> ompi-checkpoint --status 11133
> [blade02:11171] Requested - Global Snapshot Reference: (null)
> [blade02:11171] Pending - Global Snapshot Reference: (null)
> [blade02:11171] Running - Global Snapshot Reference: (null)
> [blade02:11171] File Transfer - Global Snapshot Reference: (null)
> In mpirun terminal:
> --------------------------------------------------------------------------
> WARNING: Could not preload specified file: File already exists.
> Fileset: $HOME/ompi-cr-file/ompi_global_snapshot_11133.ckpt/0
> Host: blade02
> Will continue attempting to launch the process.
> --------------------------------------------------------------------------
> [blade02:11133] 3 more processes have sent help message help-orte-filem-rsh.txt / orte-filem-rsh:get-file-exists
> [blade02:11133] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> How to disable the ¡®preload¡¯ and how to solve this problems. Thanks.

The staging option is known to be broken in the v1.4 series, per the ticket below:

If you need/want the staging feature I would suggest trying the current Open MPI trunk, or the v1.5 release series. In v1.4 you can disable the preload by removing the 'snapc_base_store_in_place' option from your mca-params.conf file.

> Btw, when there is no mca-param.conf, and the checkpoint file is placed in $HOME directory by default, I can checkpoint successfully. BUT, it takes a very very long time to checkpoint. With no checkpoint, CG runs about 100s, but with checkpoint, it runs 300s. 200% overhead ratio. WHY?

So by default the files are stored in $HOME (overridden with the snapc_base_global_snapshot_dir parameter). Depending on how your $HOME directory is mounted and your NFS is setup will determine how fast or slow this operation will take. Checkpointing directly to a shared file system can stress that file system considerably since all processes are writing at approximately the same time. Staging and other such techniques help reduce this pressure by controlling the stress on the file system. Additionally, checkpointing directly to the shared file system causes the application to remain suspended until its file is completely written, which may take a considerable amount of time depending on the speed of the file system. Staging considerably reduces the impact of checkpointing on application runtime.

I suggest trying the staging option with either the v1.5 (pre-)release or the trunk.

-- Josh

> Regards
> Whchen
> <ATT00001..txt>

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory