I want to use blcr and openmpi to checkpoint, now I can save check point and restart my work successfully. How erver I find the option "--am ft-enable-cr" will case large cost . For example , when I run my HPL job without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB network) respectively , the time costed are 8m21.180s and 16m37.732s respctively. it is should be noted that I did not save the checkpoint when I run the job, the additional cost is caused by "--am ft-enable-cr" independently. Why can the optin "--am ft-enable-cr" case so much system cost? Is it normal? How can I solve the problem.
I also test other mpi applications, the problem still exists.