6th question is as follows:
(6) About the first_continue_pass static variable in the ft_event functions.
Related frameworks are following.
Framework : bml
Component : r2
The source file : ompi/mca/bml/r2/bml_r2_ft.c
The function name : mca_bml_r2_ft_event
Framework : crcp
Component : bkmrk
The source file : ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c
The function name : ompi_crcp_bkmrk_pml_ft_event
Framework : pml
Component : ob1
The source file : ompi/mca/pml/ob1/pml_ob1.c
The function name : mca_pml_ob1_ft_event
Component : csum
The source file : ompi/mca/pml/csum/pml_csum.c
The function name : mca_pml_csum_ft_event
I think the first_continue_pass variable exists to identify
whether mca_pml.pml_ft_event(OPAL_CRS_CONTINUE) has been called at the first time
or at second time in INC-continue section when ompi_cr_continue_like_restart is true.
When mca_pml.pml_ft_event(OPAL_CRS_CONTINUE) is called at the first time,
first_continue_pass variable is true, if it is called by ompi_cr_coord_pre_continue function.
When mca_pml.pml_ft_event(OPAL_CRS_CONTINUE) is called at the second time,
first_continue_pass variable is false, if it is called by ompi_cr_coord_post_continue function,
However, I think that there is a problem, if ompi_cr_continue_like_restart isn't true.
If ompi_cr_continue_like_restart isn't true and when checkpoint is executed in an odd number of times,
INC-continue section is executed under the condition which first_continue_pass is true.
If ompi_cr_continue_like_restart isn't true and when checkpoint is executed in an even number of times,
INC-continue section is executed under the condition which first_continue_pass is false.
Therefor, mca_pml.pml_ft_event(OPAL_CRS_CONTINUE) is called in INC-continue section just once
if ompi_cr_continue_like_restart isn't true.
This behavior is incorrect.
I think that the first_continue_pass be always true if ompi_cr_continue_like_restart isn't true.
|