Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Some questions about checkpoint/restart (13),(14)
From: Takayuki Seki (seki_at_[hidden])
Date: 2010-05-28 03:54:54

13th, 14th question are as follows:

(13) Some messages are not shown even though --mca snapc_base_verbose parameter is used.

Framework : snapc
Component : full
The source file : orte/mca/snapc/base/snapc_base_open.c
The function name : orte_snapc_base_open

I think that the following verbose messages are not shown.
Because the orte_snapc_base_output ID is not initialized at those point.
The orte_snapc_base_output ID is initialized in opal_output_set_verbosity function called by mca_base_components_open function.

    OPAL_OUTPUT_VERBOSE((10, orte_snapc_base_output,
                         "snapc:base: open()"));

    OPAL_OUTPUT_VERBOSE((20, orte_snapc_base_output,
                         "snapc:base: open: base_global_snapshot_dir = %s",

    OPAL_OUTPUT_VERBOSE((20, orte_snapc_base_output,
                         "snapc:base: open: base_store_in_place = %d",

    OPAL_OUTPUT_VERBOSE((20, orte_snapc_base_output,
                         "snapc:base: open: base_only_one_seq = %d",

    OPAL_OUTPUT_VERBOSE((20, orte_snapc_base_output,
                         "snapc:base: open: base_establish_global_snapshot_dir = %d",

    OPAL_OUTPUT_VERBOSE((20, orte_snapc_base_output,
                         "snapc:base: open: base_global_snapshot_ref = %s",

Result of running.
previous messages are not shown.

 mca: base: components_open: Looking for snapc components
 mca: base: components_open: opening snapc components
 mca: base: components_open: found loaded component full
 mca: base: components_open: component full has no register function
 snapc:full: open()
 snapc:full: open: priority = 20
 snapc:full: open: verbosity = 100
 snapc:full: open: skip_filem = False
 mca: base: components_open: component full open function successful
 snapc:select: Using none component
 snapc:full: close()

(14) I use the Aggregate MCA parameter -am ft-enable-cr to enable checkpoint/restart
     fault tolerance for an MPI application.

     If two or more mca parameter files are specified by -am option,
     Fault tolerance may be disabled.

     I understand it is a specification of Open MPI.
     Is there any way to specify multiple AMCA parameter files?
     For user,it will be convenient to use MCA parameter file.

     For example:
     mpiexec .... --mca btl self,tcp -am ft-enable-cr -am /home/guest/Test/CR-Debug/local-mca-param.conf .... a.out

     -bash-3.2$ cat local-mca-param.conf

     Fault tolerance is disabled.