I want to use blcr and openmpi
to checkpoint, now I can save check point and restart my work successfully. How
erver I find the option "--am ft-enable-cr" will case large cost . For example
, when I run my HPL job without and with the option "--am
ft-enable-cr" on 4 hosts (32 process, IB network) respectively , the time
costed are 8m21.180s and 16m37.732s
respctively. it is should be noted that I did not save the checkpoint when I run
the job, the additional cost is caused by "--am ft-enable-cr" independently. Why
can the optin "--am ft-enable-cr" case so much system cost? Is it
normal? How can I solve the problem.
I also test other mpi applications,
the problem still exists.